Job Description

Job Responsibilities:

  • Responsible for the optimization, performance improvement, acceleration, dynamic expansion and fault tolerance stability of large model training and inference services
  • Study how to implement and implement known algorithms, principles or requirements and solutions
  • Track and improve industry open source solutions
  • Cooperate with the internal team to complete the implementation of the plan according to the schedule requirements
  • Support or collaborate with other teams to complete online environment deployment
  • Responsible for production line environment support, fault analysis and resolution

Requirements:

  • Responsible for the optimization, performance improvement, acceleration, dynamic scaling and fault-tolerant stability of large model training and inference services
  • Study how to implement and engineering the known algorithms, principles or requirements and solutions.
  • Track and improve industry open source solutions
  • Cooperate with internal teams to complete the implementation of solutions according to the schedule requirements
  • Support or collaborate with other teams to complete online environment deployment
  • Responsible for the support, fault analysis and resolution of the production line environment

job requirements

Qualifications

  • Proficient in Python, PyTorch programming, better to have Linux Shell, Go
  • Experienced in Linux, Docker, Kubernetes
  • Experience with Machine Learning, Neural Networking, Deep Learning, Model Training.
  • Understand the Transformer model architecture, LLM
  • Experienced with the distributed model training frameworks, Huggingface training library.
  • Better to have experience on NLP
  • Experience in Nvidia GPU, Cuda, NCCL, Tensor-RT, RDMA, RoCE, high throughput GPU cluster.
  • Experienced in Cloud Native Development, AWS, EKS, EC2
  • Understand in the acceleration technology of LLM model training and inference on GPU and distributed GPU cluster, like FlashAttention, PageAttention, Continues Batching
  • Experienced in the Inference engine solutions, prefilling, docoding, quantization, speculative decoding etc.
  • Experienced in inference cluster management, dynamical scaling on Kubernetes.

Ways of Working

Our structured hybrid approach is centered around our offices and remote work environments. The work style of each role, Hybrid, Remote, or In-Person is indicated in the job description/posting.

Benefits

As part of our award-winning workplace culture and commitment to delivering happiness, our benefits program offers a variety of perks, benefits, and options to help employees maintain their physical, mental, emotional, and financial health; support work-life balance; and contribute to their community in meaningful ways. Click Learn for more information.

About Us

Zoomies help people stay connected so they can get more done together. We set out to build the best collaboration platform for the enterprise, and today help people communicate better with products like Zoom Contact Center, Zoom Phone, Zoom Events, Zoom Apps, Zoom Rooms, and Zoom Webinars.

We’re problem-solvers, working at a fast pace to design solutions with our customers and users in mind. Here, you’ll work across teams to deliver impactful projects that are changing the way people communicate and enjoy opportunities to advance your career in a diverse, inclusive environment.

Our Commitment​

We believe that the unique contributions of all Zoomies is the driver of our success. To make sure that our products and culture continue to incorporate everyone's perspectives and experience we never discriminate on the basis of race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status. Zoom is proud to be an equal opportunity workplace and is an affirmative action employer. All your information will be kept confidential according to EEO guidelines.

We welcome people of different backgrounds, experiences, abilities and perspectives including qualified applicants with arrest and conviction records and any qualified applicants requiring reasonable accommodations in accordance with the law.

If you need assistance navigating the interview process due to a medical disability, please submit an Accommodations Request Form and someone from our team will reach out soon. This form is solely for applicants who require an accommodation due to a qualifying medical disability. Non-accommodation-related requests, such as application follow-ups or technical issues, will not be addressed.

Company Info.

Zoom Video Communications

Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Our easy, reliable cloud platform for video, phone, content sharing, and chat runs across mobile devices, desktops, telephones, and room systems.

  • Industry
    Media,Information Technology
  • No. of Employees
    5,251
  • Location
    55 Almaden Blvd, San Jose, CA 95113, USA
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Zoom Video Communications is currently hiring Machine Learning Engineer Jobs in Zhejiang, China with average base salary of ¥250,500 - ¥500,500 / Year.

Similar Jobs View More