#6280257
ses to various kinds of incident scenarios in collaboration with the Service Reliability Engineering (SRE) team and client stakeholders, and prepare runbooks
You will reduce the human effort in day-to-day operations by automating operations, using the latest tech stacks befitting the task and improving the overall efficiency of the entire team as time progresses
You will be the first responder to incidents in production/other high-value environments and execute the appropriate response as established by runbooks or based on your judgment of incidents
You will initiate or establish communication to the Level 3 support team, setting up war rooms for incident response, coordinating with tech leads, SRE leads and development teams to resolve incidents, as necessary
You will prepare incident root cause analysis (RCA) and postmortem reports, explaining analyses and outlining preventive measures to clients; Collaborating with SRE, development teams or independently, your role is to ensure clear communication and proactive steps for future incident prevention
You will implement service/product reliability improvement in collaboration with service reliability engineers by writing infrastructure/observability configuration code
Job qualifications
Technical Skills
You have hands-on experience in using CI/CD tools such as Jenkins, CircleCI or Gitlab for executing deployments
You have knowledge of Infrastructure as Code (IAC) tech stacks such as Terraform, Ansible, ARM or Cloudformation to provision and manage infrastructure
You have working experience in using observability tools for logging, monitoring, tracing and alerting, e.g.: Datadog/PrometheusGrafana, ELK/EFK/Splunk
You have experience in supporting at least one public cloud, e.g.: AWS, Azure or GCP
You have hands-on experience executing most common operations in managing workloads on any container ecosystem tech stacks. e.g.: Docker, Kubernetes, Openshift, etc.
You understand system performance tuning and scaling to handle common heavy load scenarios along with concepts of highly available systems and basics of disaster recovery solutions, and are familiar with failover, backup and recovery concepts
You have experience operating a Linux OS such as RHEL or a Debian-Based OS and are familiar with most common Linux OS operations and commands, reading and tweaking Bash scripts and managing runtime environment configurations such as Env Vars, Logs, etc.
You have experience supporting backend storage solutions such as SQL and NoSQL databases, e.g.: Postgres and MongoDB, and caching solutions such as Redis and Memcached
You have experience in networking configuration and security, and are familiar with common networking setup and security practices, e.g.: loading, balancing, proxies, transport layer security (TLS) and certificate management, and an understanding of standard network protocols and configurations
You have a good understanding of fundamental concepts of APIs such as request, response, headers, authentication, JSON payloads, etc.
Professional Skills
You have strong communication and articulation skills, are proficient in English and able to confidently hold a Q&A discussion with senior stakeholders
You have people skills with an emphasis on close collaboration with multiple, cross-functional teams from the client side or Thoughtworks
You have the ability to work under pressure and with composure during production incidents
You have strong analysis, deduction and reasoning skills, with the ability to identify patterns in data and draw conclusions
You have strong drive and ownership to sign up and deliver work when called upon without being too concerned with role boundaries
You are willing to be part of a rotation- and need-based 24x7 available team
Other things to know
Learning & Development
There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This means your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. We see value in helping each other be our best and that extends to empowering our employees in their career journeys.
Job Details
Country: Romania
City: Cluj
Date Posted: 11-20-2024
Industry: Information Technology
Employment Type: Regular
About Thoughtworks
Thoughtworks is a global technology consultancy that integrates strategy, design and engineering to drive digital innovation. For 30+ years, our clients have trusted our autonomous teams to build solutions that look past the obvious. Here, computer science grads come together with seasoned technologists, self-taught developers, midlife career changers and more to learn from and challenge each other. Career journeys flourish with the strength of our cultivation culture, which has won numerous awards around the world.
Join Thoughtworks and thrive. Together, our extra curiosity, innovation, passion and dedication overcomes ordinary.
#LI-Remote