#210663092
ng, and root cause analysis to resolve production issues and improve system stability.
Build and maintain CI/CD pipelines using Jenkins (including global libraries), and implement infrastructure as code with Terraform.
Develop and support containerized applications using Docker and Kubernetes, ensuring robust deployments and scalability.
Implement and maintain observability solutions using tools such as Grafana, Prometheus, Splunk, and OpenTelemetry.
Collaborate with engineering and support teams to drive continuous improvement and operational excellence.
Participate in on-call rotation, responding to production incidents and ensuring timely resolution.
Required qualifications, capabilities, and skills
Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experience
Experience in SRE, DevOps, or application support roles, with knowledge of SLIs/SLOs, incident response, and troubleshooting.
Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Splunk, OpenTelemetry).
Hands-on experience with CI/CD pipelines (Jenkins, including global libraries), infrastructure as code (Terraform), version control (Git), containerization (Docker), and orchestration (Kubernetes).
Exposure to cloud platforms (AWS, GCP, or Azure) and automating infrastructure and deployments.
Willingness to participate in on-call rotation and respond to production incidents.
Ability to break down issues, document solutions, and communicate effectively with team members and customers.
Preferred qualifications, capabilities, and skills