Research Scientist Intern (TikTok-Privacy Innovation Lab-GPU Systems & Model Optimization) - 2026 Start (PhD)

TikTok

4.5

(6)

San Jose, CA

Why you should apply for a job to TikTok:

4.5/5 in overall job satisfaction

4.5/5 in supportive management

100% say women are treated fairly and equally to men

100% would recommend this company to other women

100% say the CEO supports gender diversity

Ratings are based on anonymous reviews by Fairygodboss members.

Employee well-being is supported via hybrid work, short-term counseling through our EAP and a premium subscription to Headspace.

We embrace diversity across all dimensions and provide employees with 9 employee resource groups globally, including our WOMEN ERG.

Comprehensive parental leave policy as well as fertility treatment through healthcare providers with a $20,000 lifetime maximum.

#7602699537740892469

Position summary

nd to the organization's future plans and emerging technologies.

PhD internships at our company provides students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. Our dynamic internship experience blends hands-on learning, enriching community - building and development events, and collaboration with industry experts.

Applications will be reviewed on a rolling basis. We encourage you to apply early. Please state your availability clearly in your resume (Start date, End date).

About the Role
We are building next-generation generative foundation models, with a strong focus on diffusion-based and unified generation-understanding architectures, deployed in privacy-sensitive, production environments.
This role sits at the intersection of

Large-scale model training systems
GPU-first architecture and kernel-level optimization
Diffusion / DiT / unified multimodal foundation models
Privacy-preserving and compliant training pipelines

You will work on end-to-end training architecture design, from model-parallel execution and GPU efficiency to robust, fault-tolerant, privacy-aware training infrastructure.

Responsibilities:
You will work directly on core operators and system-level performance optimization for large-scale models, including but not limited to:

Design and implement high-performance GPU kernels for core components such as: Transformer / Attention / MoE / Diffusion
Perform end-to-end optimization for large model training workloads
Conduct in-depth analysis of GPU execution bottlenecks, including compute, memory, and scheduling
Use and extend Triton / CUDA / CUTLASS, and integrate optimized kernels with PyTorch / XLA / custom runtimes
Collaborate closely with model research teams to: Translate new model architectures into efficient, production-ready implementations
Reproduce, benchmark, and improve state-of-the-art system optimization techniques, validating gains in real training and inference settings

Qualifications

Minimum Qualifications

Currently pursuing PhD in Computer science, computer engineering, or a related technical discipline.
Solid understanding of GPU architecture and execution models
Proficiency in CUDA C++ or Triton, with the ability to independently write and optimize kernels
Strong familiarity with Transformer / Attention computation patterns and performance bottlenecks
Ability to read, reproduce, and reason about systems papers or open-source implementations

Preferred Qualifications

Hands-on experience with large-scale model training
Familiarity with PyTorch internals (e.g., Autograd, dispatcher, ATen)
Experience with kernel profiling and performance tuning (e.g., Nsight, nvprof, nsys)
Publications, open-source contributions, or performance benchmark results

By submitting an application for this role, you accept and agree to our global applicant privacy policy, which may be accessed here: https://careers.tiktok.com/legal/privacy