Software Engineer, Systems ML - HPC Specialist

Meta Platforms, Inc.
Apply Now

Job Description

Some aspects of this role as an HPC specialist may include authoring components such as cuBLAS, cuDNN, AITemplate, FlashAttention and development of runtimes such as LLM disaggregated runtime. HPC specialists spend time optimizing the program to reduce the accelerators idle time. They also develop tools to debug (cuda-gdb), profiler utilizing the accelerated computing hardware (such as PE’s/SFU etc in MTIA or Transformer engine in H100). They are experts in systems who are able to design, debug and accelerate AI workloads from single-node scale up to multi-node scale out distributed systems. They also are able to influence the next generation of Silicon architectures (such as Tensor Core in V100. Transformer Engine in H100) based on the evolving AI workload needs.

We are hiring in multiple locations.

Software Engineer, Systems ML - HPC Specialist Responsibilities

  • Apply relevant AI and machine learning techniques to build & optimize our intelligent systems that improve Metas products and experiences
  • Develop custom/novel architectures, define use cases, and develop methodology & benchmarks to evaluate different approaches
  • Apply in depth knowledge of how the machine learning system interacts with the other systems around it
  • Drive large efforts across multiple teams
  • Assist in goal setting related to project impact, AI system design, and ML excellence
  • Mentor other AI Engineers & improve the quality of AI work in the broader team

Minimum Qualifications

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
  • 4+ years of experience in HPC and parallel computing.
  • Proficiency in GPU programming using CUDA and familiarity with CUDA libraries (cuBLAS, cuDNN, etc.).
  • Proven track record of leading successful HPC projects.
  • Proven technical expertise in HPC architectures and technologies.
  • Effective leadership and communication skills.

Preferred Qualifications

  • PhD in Computer Science, Computer Engineering, or relevant technical field.
  • Experience developing AI algorithms or AI-System infrastructure in C/C++ or Python.
  • Experience with distributed systems or on-device algorithm development.

Company Info.

Meta Platforms, Inc.

Meta Platforms (formerly known as Facebook Inc.) is a large technology company that was founded in 2004 by Mark Zuckerberg and several of his college roommates. The company is based in Menlo Park, California, and is primarily known for its flagship social media platform, Facebook. In addition to its consumer-facing products, Meta Platforms also offers a range of advertising and marketing services to businesses and organizations.

  • Industry
    Advertising,Consumer electronics,Social media Company,Artificial intelligence
  • No. of Employees
    76,000
  • Location
    1 Hacker Way, Menlo Park, CA 94025, USA
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Meta Platforms, Inc. is currently hiring Software Engineer, Machine Learning Jobs in New York, NY, USA with average base salary of $172,994 - $241,000 / Year.

Similar Jobs View More