#22331_R-237932
d solutions help individuals, financial institutions, governments, and businesses realize their greatest potential.
Our decency quotient, or DQ, drives our culture and everything we do inside and outside of our company. With connections across more than 210 countries and territories, we are building a sustainable world that unlocks priceless possibilities for all.
Overview:
Dynamic Yield by Mastercard is seeking a Site Reliability Engineer who excels at bridging the gap between infrastructure and development. In this role, you will work closely with engineering teams to ensure the reliability, scalability, and performance of our systems. A strong emphasis will be placed on observability - designing and implementing effective monitoring, logging, tracing and alerting solutions to provide deep visibility into system behavior. You should be comfortable collaborating with developers, presenting technical insights, and helping shape best practices. Your responsibilities will include incident management, automation and improvement of our observability solutions, and continuous performance tuning to ensure our platform can scale and evolve with our business needs.
Role:
• Ensure production systems meet or exceed established SLAs and SLOs by actively maintaining and enhancing system performance and uptime.
• Design and maintain end-to-end observability systems-including monitoring, logging, and distributed tracing-to detect anomalies and enable proactive issue resolution.
• Work closely with engineering teams to improve how their applications are monitored and alerted on. Help define meaningful alerts, reduce noise, and ensure developers are accountable for the operational health of their services.
• Optimize application performance on Kubernetes through resource tuning, scaling strategies, and deep performance analysis.
All about you:
• 5+ years in SRE, DevOps, or Production Engineering roles
• Deep expertise in AWS, Kubernetes, Linux
• Being responsible of deploying and tuning monitoring tools like Prometheus, Thanos and any time-series databases for storing metrics.
Logging responsibilities with ELK stack, Loki, Grafana or any alternatives.
Experience with tracing - opentelemetry, tempo, jaeger
• Strong understanding of incident management processes and best practices.
• Experience with automation tools and practices for deployment and infrastructure management.
• Excellent communication and collaboration skills, with the ability to work effectively in a team environment.
Ownership mindset, proactive and reliable
Corporate Security Responsibility
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must: