Technical Senior Principal | Site Reliability Engineer | Kubernetes

GM Financial

4

(3)

Arlington, VA

Why you should apply for a job to GM Financial:

  • 4/5 in overall job satisfaction
  • 5/5 in supportive management
  • Ratings are based on anonymous reviews by Fairygodboss members.
  • We offer 12 weeks of paid parental leave for our team members to care for and bond with their new family member.
  • Our Women's Inspiration Network (WIN) supports the recruitment, retention and professional development of women across our organization.
  • Our programs provide the support, flexibility and resources for women returning to their careers after a break.
  • #1151

    Position summary

    are changing the way we use technology to support our customers, dealers and business.

    Flexible hybrid work environment (onsite 3 days a week/2 days remote) at our Arlington (AOC1), TX office.

    RESPONSIBILITIES

    About the Role
    As a Senior Principal SRE, you will be the technical bar‑raiser for our centralized Kubernetes platform-setting strategy, owning reliability at fleet scale, and leading cross‑org engineering to deliver a self‑service, secure, and compliant platform. You will partner with Architecture, BPS, Cloud Ops, and Cyber to turn our roadmap into durable, automated capabilities that product teams adopt with minimal toil.

    Top Outcomes You Will Drive

    • Fleet‑level reliability strategy for shared and dedicated clusters, defining SLOs/SLIs and error budgets for the platform and golden patterns, with automated enforcement and reporting.

    • Self‑service at scale: deliver Namespace‑as‑a‑Service and developer‑portal workflows that shrink onboarding from weeks to hours and unlock safe autonomy for product teams.

    • Observability by default: land built‑in cluster/workload dashboards (Splunk APM + Azure Monitor/App Insights) and a robust RCA/Problem‑Management loop that closes the gap between incidents and engineering improvements.

    • Multi‑cloud readiness: guide centralized Kubernetes deployment expansion to AWS and design portable patterns (identity, networking, GitOps) that remain cloud‑agnostic.

    • Secure networking & policy: lead adoption of Calico Enterprise (DNS‑based policy, honey pods, central policy mgmt.) and staged rollout of stretched mesh/identity‑based access across clusters.

    • Path to a Kubernetes-as-a-Serverless : influence the architecture that abstracts K8s, integrates pre‑connected services, and enforces governance/consistency with a service catalog and on‑demand APIs.

    • Scale the operating model: codify the RACI, reduce reactive workload, shift‑left with support enablement, and build automation that lets a small core team support a large fleet.

    Core Responsibilities

    • Own multi‑cluster reliability: capacity modeling, failure domain strategy, upgrade design (blue/green, surge, or secondary‑cluster) and chaos/DR exercises across shared & dedicated environments.

    • Define and implement platform SLOs/SLIs (control plane, base stack, onboarding, GitOps, network policy propagation, secret/cert rotation) with automated alerts and error‑budget policies.

    • Lead the design/implementation of Namespace‑as‑a‑Service; measure adoption, lead time, and customer effort score.

    • Establish GitOps standards (Argo CD) for app and cluster configuration, including bootstrap, drift detection, and progressive delivery (blue/green, canary).

    • Architect and land Calico/Tigera Enterprise and/or service mesh patterns (east‑west controls, identity‑based policies, multi‑cluster traffic mgmt.), with guardrails and paved‑road configs.

    • Lead security & compliance by default: SR controls, RBAC baselines (Azure RBAC/workload identity), cert‑manager automation, patch cadence, and auditable change pipelines.

    • Serve as principal‑level incident commander and RCA owner for platform incidents; convert findings into backlog items, patterns, and training.

    • Partner with the necessary teams to scale operations and refine RACI; implement charge/show‑back models for high‑touch migrations when appropriate.

    • Mentor Staff/Principal engineers; raise the bar on design docs, ADRs, runbooks, and knowledge sharing across the platform and product teams.

    QUALIFICATIONS

    What makes you a dream candidate?

    Knowledge and Skills

    • Deep experience with GitOps (Argo CD), service mesh (Istio/Linkerd), Calico/Tigera, cert‑manager, secret engines, and workload identity.

    • Strong IaC/automation: Terraform, Azure DevOps (YAML), CI/CD policy gates, automated security controls.

    • Observability at scale: Splunk APM, Azure Monitor, Application Insights; golden dashboards and SLO pipelines.

    • Distributed systems fundamentals: performance, scalability, capacity, and reliability.

    • Excellent communication; ability to lead across org boundaries and mentor senior engineers.

    Experience and Education

    • High School Diploma or equivalent required

    • Bachelor's Degree or Associate Degree plus 2 additional years of relevant experience required

    • 12+ years in related function(s) required

    • 5-7 years of experience leading through mentorship in related field required

    • 5-7 years of experience driving thought leadership and innovation across products required

    Preferred Skills

    • Multi‑cluster and multi‑region upgrade strategies (surge/blue‑green), active‑active patterns, and zero‑downtime migrations.

    • Network policy at scale (DNS‑based policies), L7 authorization, east‑west security controls.

    • Self‑service developer portals and onboarding workflows; measuring adoption and customer effort.

    • FinOps for Kubernetes (charge/show‑back, pod‑level cost breakdown), quota guardrails, and capacity/right‑sizing automation.

    • Experience with Kubernetes platform abstraction and curated service catalogs.

    • Expert in SRE: SLO/SLI design, error budgets, incident command, RCA/Problem Management, chaos/DR.

    What We Offer: Generous benefits package available on day one to include: 401K matching, bonding leave for new parents (12 weeks, 100% paid), tuition assistance, training, GM employee auto discount, community service pay and nine company holidays.
    Our Culture: Our team members define and shape our culture - an environment that welcomes innovative ideas, fosters integrity, and creates a sense of community and belonging. Here we do more than work - we thrive.
    Compensation: Competitive pay and bonus eligibility
    Work Life Balance: Flexible hybrid work environment, 2-days a week in office
    #LI-DW1 #LI-Hybrid #GMFjobs

    Why you should apply for a job to GM Financial:

  • 4/5 in overall job satisfaction
  • 5/5 in supportive management
  • Ratings are based on anonymous reviews by Fairygodboss members.
  • We offer 12 weeks of paid parental leave for our team members to care for and bond with their new family member.
  • Our Women's Inspiration Network (WIN) supports the recruitment, retention and professional development of women across our organization.
  • Our programs provide the support, flexibility and resources for women returning to their careers after a break.