and improve the whole lifecycle of service, from inception and design, through to deployment, operation and refinement.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health. Practice sustainable incident response and blameless postmortems.
- Establish best engineering practice for engineers as well as non-technical people.
- Design and implement reliable, scalable, robust and extensible big data systems that support core products and business.
Qualifications
Minimum Qualifications
- Bachelor's degree in Computer Science, a related technical field involving software or systems engineering, or equivalent practical experience.
- Experience with site reliability engineering, monitoring, alerting for big data related systems.
- Experience writing code in Java, Go, Python or a similar language.
Preferred Qualifications
- Knowledge about a variety of strategies for ingesting, modeling, processing, and persisting data, ETL design, job scheduling and dimensional modeling.
- Familiarity with running production grade services at scale and understanding cloud native technologies and networking.
- Experience developing tools and APIs to reduce human interaction with systems and applications using a variety of coding and scripting standards.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems is a plus (Hadoop, M/R, Hive, Spark, Presto, Flume, Kafka, ClickHouse, Flink or comparable solutions).
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.