ion of computing resources, minimizing response time, and maximizing throughput.
- Cross-Team Collaboration: Work closely with algorithm and business teams to facilitate the deployment of models into production and resolve issues that arise in the production environment.
- Technical Innovation: Continuously monitor and explore new technologies and methods in the AI field to drive technological advancement in model services.
Qualifications
Minimum Qualifications:
- Bachelor's degree or higher in Computer Science, Software Engineering, Artificial Intelligence, or related fields.
- 3+ years of relevant work experience, with experience in deploying and servicing large-scale machine learning models.
- Proficiency in mainstream deep learning frameworks (such as TensorFlow, PyTorch, DeepSpeed) and their deployment in production environments.
- Familiarity with model inference optimization techniques, such as quantization, distillation, distributed inference, ONNX, ZeRO, etc.
- Familiarity with online service tech stacks, such as RPC, Redis, Kafka, etc.
- Strong programming skills, proficient in Python, C++ or Golang, with a deep understanding of system performance optimization.
Preferred Qualification:
- Have LLMs deployment and optimization experience