olling basis - we encourage you to apply early.
Successful candidates must be able to commit to at least 3 months long internship period.
Responsibilities
- Optimize model performance and memory efficiency on GPU-based systems.
- Collaborate with research and infra teams to deploy high-throughput training and inference pipelines.
- Develop tools and libraries to accelerate deep learning workloads at scale.
- Analyze system performance (e.g. GPU profiling, kernel analysis, throughput tuning).
Qualifications
Minimum Qualifications:
- Undergraduate, or Postgraduate who is currently pursuing a degree/master in Computer Science, EE, or related field.
- Familiarity with deep learning and frameworks like PyTorch or TensorFlow. And understanding of GPU basics (memory hierarchy, compute kernels).
- Programming experience in Python/C++/CUDA/Trition is a plus.
Preferred Qualifications:
- Course or project experience with LLMs or recommender systems.
- Exposure to CUDA, NCCL, or deep learning compiler stacks (e.g. XLA, TensorRT).
- Publications or open-source contributions are a bonus.
By submitting an application for this role, you accept and agree to our global applicant privacy policy, which may be accessed here: https://careers.tiktok.com/legal/privacy
If you have any questions, please reach out to us at [email protected]