nd to the organization's future plans and emerging technologies.
PhD internships at our company provides students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. Our dynamic internship experience blends hands-on learning, enriching community - building and development events, and collaboration with industry experts.
Applications will be reviewed on a rolling basis. We encourage you to apply early. Please state your availability clearly in your resume (Start date, End date).
About the Role
We are building next-generation generative foundation models, with a strong focus on diffusion-based and unified generation-understanding architectures, deployed in privacy-sensitive, production environments.
This role sits at the intersection of
- Large-scale model training systems
- GPU-first architecture and kernel-level optimization
- Diffusion / DiT / unified multimodal foundation models
- Privacy-preserving and compliant training pipelines
You will work on end-to-end training architecture design, from model-parallel execution and GPU efficiency to robust, fault-tolerant, privacy-aware training infrastructure.
Responsibilities:
You will work directly on core operators and system-level performance optimization for large-scale models, including but not limited to:
- Design and implement high-performance GPU kernels for core components such as: Transformer / Attention / MoE / Diffusion
- Perform end-to-end optimization for large model training workloads
- Conduct in-depth analysis of GPU execution bottlenecks, including compute, memory, and scheduling
- Use and extend Triton / CUDA / CUTLASS, and integrate optimized kernels with PyTorch / XLA / custom runtimes
- Collaborate closely with model research teams to: Translate new model architectures into efficient, production-ready implementations
- Reproduce, benchmark, and improve state-of-the-art system optimization techniques, validating gains in real training and inference settings
Qualifications
Minimum Qualifications
- Currently pursuing PhD in Computer science, computer engineering, or a related technical discipline.
- Solid understanding of GPU architecture and execution models
- Proficiency in CUDA C++ or Triton, with the ability to independently write and optimize kernels
- Strong familiarity with Transformer / Attention computation patterns and performance bottlenecks
- Ability to read, reproduce, and reason about systems papers or open-source implementations
Preferred Qualifications
- Hands-on experience with large-scale model training
- Familiarity with PyTorch internals (e.g., Autograd, dispatcher, ATen)
- Experience with kernel profiling and performance tuning (e.g., Nsight, nvprof, nsys)
- Publications, open-source contributions, or performance benchmark results
By submitting an application for this role, you accept and agree to our global applicant privacy policy, which may be accessed here: https://careers.tiktok.com/legal/privacy