#210609686
lems by writing high-quality, maintainable, and robust code following software engineering best practices.
Participate in triaging, diagnosing, and resolving incidents, collaborating with others to address root causes.
Recognize toil in your role and proactively work to eliminate it through systems engineering or application code updates.
Understand observability patterns and strive to implement and improve service level indicators, objectives, monitoring, and alerting solutions for optimal transparency and analysis.
Required qualifications, capabilities, and skills
Formal training or certification in software engineering concepts and 2+ years of applied experience
Demonstrate coding ability in at least one programming language desirably in Java or Python and experience in maintaining cloud-based infrastructure.
Familiar with site reliability engineering concepts and practices, including monitoring and alerting using tools like Grafana, Dynatrace, Prometheus, Datadog and Splunk.
Knowledge on build, deployment and trouble shooting of AI/ML models.
Knowledge on data science and ensure security and scalability of ML systems.
Understand and work with containers or common server operating systems such as Linux and Windows.
Collaborate effectively in a large team, vocalizing ideas with peers and managers, and adapt work plans to changing responsibilities and projects.
Experience in developing, debugging, and maintaining code in a large corporate environment with one or more modern programming languages and database querying languages
Exposure to agile methodologies such as CI/CD, Application Resiliency, and Security
Emerging knowledge of software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning etc.)
Preferred qualifications, capabilities, and skills