#721909BR
ng work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
Persistent testing of application and infrastructure resiliency over a variety of error conditions.
Support the compliance and security integrity of the environment
Develop, communicate, and monitor standard processes to promote the long-term health of sustainability and health of operational development tasks.
Standup and maintain pre-production and developer environments to support the entire development organization and improve overall team velocity
Use metrics and analytics to determine reliability issues and remove them through automation and tooling
Be an advocate for our customers, providing them self-diagnosing tools to resolve common issues that arise in the field
Required to participate in code reviews for your peers' development work, triage and solve live customer issues, and participate in all scrum activities
Additionally, monitor, measure, and improve code and data performance for the application you help to develop
Available for on-call shifts during daytime hours and weekends
All of this will take place in a strong team environment, which necessitates strong communication
Required Technical and Professional Expertise
4-8 years of experience delivering code for active Cloud Services/Projects
Experience debugging complex problems
Experience designing, building, and operating large-scale production systems
Expertise in Ansible, Bash, core Python development, and deployments in production environment is a must.
Experience automating infrastructure, configuration management, testing, and deployments using tools like Ansible, Chef and can explain the Infrastructure as Code paradigm
A strong understanding of diverse infrastructure platforms and infrastructure concepts required.
Systems management experience in Linux/UNIX systems (RHEL preferred)
Experience in Docker and containerization technologies
Experience with cloud computing technologies
Experience with k8s CRDs, k8s controller programming with watcher informer model
Must have good experience in Infrastructure Operations automation and IT Service Management with hands on exposure in data center administration, configuration , Incident management and support
5+ years of working knowledge with one or more operating systems: Ubuntu (Preferred), RHEL, CentOS Linux, and Windows Servers
Strong experience with one or more Virtualization technologies: KVM, Xen, Citrix Hypervisor, VMware vSphere, etc.
Working knowledge with one or more programming tools: Bash, PowerShell, Python, Ruby and Go.
Strong Communication skills
Preferred Technical and Professional Expertise