SoC RAS Design Tech Lead, Machine Learning Accelerators

Google

3.8

(162)

Sunnyvale, CA

Why you should apply for a job to Google:

  • 56% say women are treated fairly and equally to men
  • 77% say the CEO supports gender diversity
  • Ratings are based on anonymous reviews by Fairygodboss members.
  • Generous parental and caregiver leave along with fertility and growing family support.
  • Flexible work options that include a hybrid work model, four “work from anywhere” weeks, and remote work opportunities.
  • A chance to be a part of a variety of employee resource groups, community groups, and culture clubs.
  • #103048070085649094

    Position summary

    s.

    • Experience of SOC subsystem level logic redundancy design and test architecture.

    • Understanding of circuit level SER (Soft Error Rate) modeling, measurement and mitigation techniques.

    • Understanding of error coding techniques and design experience of ECC implementations.

    • Understanding of SDC, DUE and DCE, and associated metrics, analysis and calculations.

    About the job

    Be part of a diverse team that pushes boundaries, developing custom silicon solutions that power the future of Google's direct-to-consumer products. You'll contribute to the innovation behind products loved by millions worldwide. Your expertise will shape the next generation of hardware experiences, delivering unparalleled performance, efficiency, and integration.

    In this role, you will join a team working on building SOC design for our data center accelerators. As a RAS (Reliability, Availability, Serviceability) SOC Design Technical Lead, you will own and lead the requirement definition, architecture, microarchitecture and the development of the SOC RAS features. This is a highly cross-functional role that requires a high-level of coordination and co-design with our platform and system hardware counterparts. You will have experience in RAS, computer architecture and logic design, and have a propensity for leading multi-faceted efforts involving many stakeholders.

    Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.

    The US base salary range for this full-time position is $221,000-$314,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

    Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google .

    Responsibilities

    • Define the architecture and microarchitecture of RAS features of TPU SOCs.

    • Lead the design and implementation of the RAS features.

    • Collaborate with Platform team and co-design the SOC level RAS requirements.

    • Be responsible for setting the DCE (Detectable and Correctable Errors), DUE (Detected but Unrecoverable Errors) and SDC (Silent Data Corruption) goals, DPPM goals for TPUs.

    Why you should apply for a job to Google:

  • 56% say women are treated fairly and equally to men
  • 77% say the CEO supports gender diversity
  • Ratings are based on anonymous reviews by Fairygodboss members.
  • Generous parental and caregiver leave along with fertility and growing family support.
  • Flexible work options that include a hybrid work model, four “work from anywhere” weeks, and remote work opportunities.
  • A chance to be a part of a variety of employee resource groups, community groups, and culture clubs.