100% say women are treated fairly and equally to men
100% would recommend this company to other women
100% say the CEO supports gender diversity
Ratings are based on anonymous reviews by Fairygodboss members.
Employee well-being is supported via hybrid work, short-term counseling through our EAP and a premium subscription to Headspace.
We embrace diversity across all dimensions and provide employees with 9 employee resource groups globally, including our WOMEN ERG.
Comprehensive parental leave policy as well as fertility treatment through healthcare providers with a $20,000 lifetime maximum.
#7542714368766396680
Position summary
uests, achieving large-scale improvements in resource usage efficiency and global optimality;
Responsible for preemption and re-scheduling mechanisms for services with different prioritties, and manage automatic resource multiplexing across different clusters and resource types; handle scheduling and load adaptation across multi-datacenter, multi-region, and multi-cloud environments.
Building Training System Architecture for Next-Generation Ultra-Large and Ultra-Deep Recommendation Models:
Develop a flexible, elastic and robust distributed training runtime focused on hyper-scaled embeddings and large-scale GPU training;
Design and optimize distributed computing APIs and runtimes geared towards future recommendation and ads model paradigms (e.g., reinforcement learning, fine-tuning and/or distillation);
Collaborate with platform teams to enhance the diagnosability and usability of distributed training systems.
Constructing Online Orchestration Architecture for Next-Generation Recommendation Systems:
Build a robust distributed model inference architecture for online learning scenarios involving hyper-scaled embeddings;
Optimize the usability of online recommendation and ads model architectures and MLops workflows.
Qualifications
Minimum Qualifications
Bachelor's degree or above, majoring in Computer Science, Engineering or related fields.
Strong programming and coding experience with at least one modern language such as Golang, Python.
Experience contributing to the large scale distributed systems, multi-tenant systems (architecture, reliability and scaling).
Strong analytical abilities and problem solving.
Good communication, self-motivation, engineering practice, documentation, etc.
At least 3 years of relevant experience.
Preferred Qualifications
Familiar with large-scale distributed scheduling systems like Kubernetes, Yarn, Flink and/or Spark
Familiar with opensourced orchestration frameworks like VeRL, vLLM, Ray or TFX, etc.
Why you should apply for a job to TikTok:
4.5/5 in overall job satisfaction
4.5/5 in supportive management
100% say women are treated fairly and equally to men
100% would recommend this company to other women
100% say the CEO supports gender diversity
Ratings are based on anonymous reviews by Fairygodboss members.
Employee well-being is supported via hybrid work, short-term counseling through our EAP and a premium subscription to Headspace.
We embrace diversity across all dimensions and provide employees with 9 employee resource groups globally, including our WOMEN ERG.
Comprehensive parental leave policy as well as fertility treatment through healthcare providers with a $20,000 lifetime maximum.