Research Scientist, Audio Language Models




Zürich, Switzerland


Position summary

Minimum qualifications:

  • PhD in Computer Science, Artificial Intelligence, Machine Learning, a related technical field, or equivalent practical experience.

  • Experience in Machine Learning Algorithms/Research and Language Modeling.

  • Experience with publications (https://e.g., NeurIPS, ICLR, ICML, ICASSP, INTERSPEECH), or a field related to Machine Learning.

  • Experience in Python.

Preferred qualifications:

  • Experience working in an industry or academic research lab, and taking research from concept to product.

  • Experience in areas such as speech, general audio processing, speech-to-speech or language modeling in audio.

  • Experience collaborating or leading an applied research project.

About the job

As an organization, Google maintains a portfolio of research projects driven by fundamental research, new product innovation, product contribution and infrastructure goals, while providing individuals and teams the freedom to emphasize specific types of work. As a Research Scientist, you'll setup large-scale tests and deploy promising ideas quickly and broadly, managing deadlines and deliverables while applying the latest theories to develop new and improved products, processes, or technologies. From creating experiments and prototyping implementations to designing new architectures, our research scientists work on real-world problems that span the breadth of computer science, such as machine (and deep) learning, data mining, natural language processing, hardware and software performance analysis, improving compilers for mobile platforms, as well as core search and much more.

As a Research Scientist, you'll also actively contribute to the wider research community by sharing and publishing your findings, with ideas inspired by internal projects as well as from collaborations with research programs at partner universities and technical institutes all over the world.

Our Research team focuses on audio generative AI, driving the development of the foundational technology adopted by AudioLM, a state-of-the-art framework capable of generating speech, music and general sounds. We developed the core components of AudioLM, including soundstream and soundstorm, and directly contributed to models addressing a wide array of use cases, ranging from text-to-speech synthesis (https://e.g., SPEAR-TTS), text-to-music (https://e.g., MusicLM) and speech-to-speech translation (https://e.g., AudioPaLM). We have ongoing collaborations with different product teams. This allows us to advance the state-of-the-art with scientific publication, while at the same time push the outcome of our research projects to Google products.

Google Research addresses challenges that define the technology of today and tomorrow. From conducting fundamental research to influencing product development, our research teams have the opportunity to impact technology used by billions of people every day.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field -- we publish regularly in academic journals, release projects as open source, and apply research to Google products.


  • Explore how to extend AudioLM to support emerging use cases that require streaming-aware processing on any kind of audio. 

  • Design universal audio representations, which are learned to capture all the necessary information needed to drive the synthesis process regardless of the specific content type, while at the same time providing a compact representation suitable for language modeling.

  • Design all modeling components, from tokenizers to language models, so that stringent constraints can be met in terms of algorithmic and compute latency.

  • Implement models, design experiments and deploy proof-of-concepts in the area of audio generation to demonstrate universal audio synthesis capable of producing complex soundscapes in real time, encompassing speech, music and general sounds.

Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.

Why you should apply for a job to Google:

  • 56% say women are treated fairly and equally to men

  • 75% say the CEO supports gender diversity

  • Ratings are based on anonymous reviews by Fairygodboss members.
  • Generous parental and caregiver leave along with fertility and growing family support.

  • Flexible work options that include a hybrid work model, four “work from anywhere” weeks, and remote work opportunities.

  • A chance to be a part of a variety of employee resource groups, community groups, and culture clubs.