Systems Engineer, AI Systems

Meta Platforms, Inc.
Apply Now

Job Description

RTP Engineers work closely with HW/SW co-design teams, hardware designers, networking teams, system manufacturers, component vendors, capacity engineering, production engineering, production services, and data center operations teams to enable new systems that will be deployed in our production data centers. We also work across service and hardware architectures for new AI systems, build prototypes to demonstrate the value, enable go/no-go decisions and optimize these systems in production.

Systems Engineer, AI Systems Responsibilities

  • Interface with external vendors and internal hardware, mechanical, power, thermal, manufacturing and software engineers to understand system architecture to develop and execute the test suites for various architectures
  • Leverage deep understanding HW/SW systems to execute full product life-cycles (prototyping, deployment, and support)
  • Champion engineering and operational excellence, establishing metrics and process for regular assessment and improvement
  • Develop visibility through data visualization and implement systemic solutions to hardware health issues
  • Proactively create experiments and tooling to detect and diagnose hardware/firmware/software health issues
  • Troubleshoot, diagnose and root cause of system failures and isolate the components/failure scenarios while working with internal & external stakeholders
  • Drive necessary discussion with external and internal teams on test specification and methodologies to improve test quality continuously

Minimum Qualifications

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
  • 5+ years of work experience in one or more domains such as: ASIC development (Silicon design or bringup or characterization), compute (ARM, x86), AI-ML hardware/software (GPUs, TPUs), networking (e.g. Switches, NICs), interconnect technologies (e.g. Optics, DAC)
  • Knowledge of server architecture and components across Compute/Storage/AI Systems
  • Experience troubleshooting problems at system level, crossing across multiple components, as well as hardware/ software boundaries
  • 3+ years of experience support hardware design team (silicon design or datacenter platform design: compute, storage, etc.) in end-to-end product development cycles.
  • 3+ years of experience in high-impact/high profile/at scale project leadership and ownership.

Preferred Qualifications

  • 3+ years of experience with one subset of the following AI systems: Accelerator (GPU/ASIC), Kernel development, Performance optimization (e.g., NVIDIA, AMD, Intel, or other misc accelerator), computer architecture, HPC communication libraries (e.g., NCCL, MPI), performance enablement, tracing, profiling and debugging.
  • Experience with architecture of disaggregated systems at scale
  • 4+ years experience with IO and memory buses like PCIe, DDR

Company Info.

Meta Platforms, Inc.

Meta Platforms (formerly known as Facebook Inc.) is a large technology company that was founded in 2004 by Mark Zuckerberg and several of his college roommates. The company is based in Menlo Park, California, and is primarily known for its flagship social media platform, Facebook. In addition to its consumer-facing products, Meta Platforms also offers a range of advertising and marketing services to businesses and organizations.

  • Industry
    Advertising,Consumer electronics,Social media Company,Artificial intelligence
  • No. of Employees
    76,000
  • Location
    1 Hacker Way, Menlo Park, CA 94025, USA
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Meta Platforms, Inc. is currently hiring System Engineer Jobs in Menlo Park, CA, USA with average base salary of $190,000 - $256,000 / Year.

Similar Jobs View More