s like Ray to orchestrate large-scale distributed ML workflows.
Responsibilities:
- Design and develop core Flink operators, connectors, or runtime modules to support TikTok's exabyte-scale real-time processing needs.
- Build and maintain low-latency, high-throughput streaming pipelines powering online learning, recommendation, and ranking systems.
- Collaborate with ML engineers to design end-to-end real-time ML pipelines, enabling efficient feature generation, training data streaming, and online inference.
- Leverage Velox for compute-optimized ML data transformation and training acceleration on multimodal datasets (e.g., video, audio, and text).
- Use Ray to coordinate distributed machine learning workflows and integrate real-time feature pipelines with ML model training/inference.
- Optimize Flink job performance, diagnose bottlenecks, and deliver scalable solutions across EB-scale streaming workloads.
Qualifications
Minimum Qualifications:
- Currently pursuing a PhD's degree in Computer Science, Software Engineering, Data Engineering, or a related technical field.
- Strong programming skills in Java, Scala, or Python.
- Understanding of distributed systems, stream processing, and event-driven architecture.
- Familiar with system design concepts such as fault tolerance, backpressure, and horizontal scalability.
- Demonstrated ability to debug and analyze complex distributed jobs in production environments.
Preferred Qualifications:
- Graduating in December 2025 or later, with the intent to return to your academic program.
- Experience with Apache Flink, Spark Streaming, or Kafka Streams.
- Hands-on experience with Ray for distributed ML or workflow orchestration.
- Familiarity with Velox, Arrow, or similar columnar execution engines for training/feature pipelines.
- Understanding of multimodal data processing (e.g., combining video, audio, and text in model training pipelines).
- Experience working with data lake ecosystems (e.g., Iceberg, Hudi, Delta Lake) and cloud-native storage at PB-EB scale.
- Contributions to open-source projects or participation in ML/engineering hackathons or competitions.