#21272_R-73913
tive people in the world, and then give them the support to constantly innovate, iterate and serve consumers more directly and personally. Our teams are innovative, diverse, multidisciplinary and collaborative, taking technology into the future and bringing the world with it.
WHAT YOU WILL WORK ON
The following are a High Availability Engineer's responsibility for this role but is not limited to:
Plan and lead chaos engineering exercises to expose weaknesses in Nike production systems, uncover any gaps in monitoring and observability, and put in place solutions to lower the effects of any failures on consumers
Partner with infrastructure and application engineering teams to ensure solutions meet high availability/disaster recovery requirements, including gap identification, assessment, and remediation
Assess current disaster recovery strategy, impacts, and risks including business, legal, and IT perspectives
Propose various application design patterns and develop disaster recovery scenarios/resilience requirements
Act as a subject matter expert to system owners on industry standards and best practices for disaster recovery
Lead technical recovery efforts in the event of a site outage
WHO WE ARE LOOKING FOR
Within the Reliability Engineering our goal is to provide technical solutions to complex production problems with a focus on reduction of incident and problem toil, speeding detection and recovery of critical incidents through observability and continuous improvement through operational health measurement and sharing.
Requires a bachelor's degree in Computer Science, Engineering, IT or a related field; MBA a plus. Minimum of 7 years of relevant work experience.
Ability to lead assessments of applications and infrastructure components to identify gaps related to high availability/disaster recovery
2-4 years of software development experience
2 - 4 years' experience in building cloud-based enterprise systems, ideally on AWS.
Proficient in Java 8 and newer
Proficient with JavaScript on frontend (React, Angular, etc.) and backend (Node.js) components.
Demonstrable knowledge of Linux operating system internals, TCP/IP, filesystems, disk/storage technologies
Basic understanding of DNS, Networking, Virtualization
Experience with Docker and/or serverless patterns.
Expertise in designing and building scalable Micro Services
Expertise in web and web-app patterns
Expertise in NoSQL datastore systems to build highly scalable solutions
Experience with expertise in other modern enterprise languages (functional or other - Scala, Python, Golang, etc.)
Experience with securing Restful APIs and Apps using OAuth and OpenID Connect and JWT
Good understanding of async/non-blocking Restful APIs approaches and frameworks
Experience within messaging (pub-sub) patterns
Demonstrated negotiation and influencing skills
Basic understanding of most of the following: ServiceNow, Jira, Jenkins, GitHub, Splunk, New Relic, or equivalent Application Performance Monitoring Tool
WHO YOU WILL WORK WITH
Collaborate with and consult Architecture, Security, Privacy, and other subject matter experts within Nike
Act as the subject matter expert for resilience and high availability of consumer facing (e-commerce) systems/applications
Communicate to leadership the status of recovery time (RTO), recovery point objectives (RPO) for critical systems