#req10866
dashboards and maintain the ELK stack to ensure high availability and performance of logging infrastructure.
Integrate metrics, logs, and traces into a unified observability platform.
Build and maintain alerting pipelines to reduce noise and improve signal-to-noise ratio for production incidents.
Contribute to infrastructure automation using tools like Terraform, Helm.
Set up and support CI/CD pipelines for automated testing, deployment, and rollback across multiple environments.
Participate in shift rotations and continuously improve observability and response systems.
You've Got What It Takes If You Have...
2+ years in an SRE, DevOps, or Infrastructure Engineer role.
Bachelor's degree in computer science, IT, or related technical field.
Hands-on experience on AWS and GCP Cloud
Deep hands-on experience with Kubernetes (EKS, AKS, GKE)
Strong understanding of Linux internals, container orchestration, and microservice architecture.
Hands-on experience with monitoring/logging tools:
Prometheus, Grafana, InfluxDB
ELK stack (Elasticsearch, Logstash, Kibana)
Proficient in incident response and alerting tools (PagerDuty etc.).
Basic knowledge of:
Kafka - topic monitoring, consumer health
ElastiCache / Redis - caching patterns and troubleshooting
InfluxDB - time-series metrics storage
Experience writing and maintaining automation scripts in Bash, Python, or Go.
#LI-Onsite