#req10062
es for all production systems and document the knowledge base
Administer the Incident Management activities (detect, record, classify and close) and provide timely escalations and notifications as required by procedure
Design, build, and maintain scalable, reliable, and secure infrastructure.
Participate in on-call rotation to respond to cloud-related incidents and emergencies.
Troubleshoot and resolve complex technical issues in a timely manner.
Monitor and optimize cloud infrastructure for performance, cost, and security.
Collaborate with cross-functional teams to troubleshoot and resolve complex cloud-related issues.
Mentor junior team members and provide technical guidance and support.
You've Got What It Takes If You Have...
Minimum bachelor's degree in computer science, engineering, or a related field.
5+ years of experience in cloud operations.
Strong communication and collaboration skills.
Excellent troubleshooting and problem-solving skills.
Comprehensive understanding of cloud computing principles and architectures.
Extensive experience in Linux/Unix environments.
Proficiency in containerization technologies like Docker and Kubernetes.
Strong scripting skills in Python or Bash.
Proficient in debugging and optimizing Java-based applications.
Hands-on experience in deploying, optimizing, and troubleshooting applications on Tomcat and JBoss application servers.
Hands-on experience in managing and optimizing Memcached, Nginx, ActiveMQ, Elasticsearch, and Redis applications.
Experience with monitoring and logging tools such as Newrelic and ELK stack.
Sound knowledge of networking concepts, including TCP/IP, DNS, and VPN.
Proficient in automation and configuration management tools like Ansible, Jenkins, and Bitbucket.
Thorough understanding of monitoring and alerting tools such as Nagios, New Relic, Grafana, and CheckMk.
Experience with distributed storage technologies such as NFS, Netapp, and Amazon S3, as well as dynamic resource management frameworks (e.g: Kubernetes).
Experience working in Datacenter and AWS cloud platforms.
#LI-Onsite