Machine Learning Operations Engineer

Sanctuary AI
Apply Now

Job Description

Sanctuary AI - a multi award-winning LinkedIn Top Startup company - is looking to hire a Machine Learning (ML) Operations Engineer. Reporting to the Data Collection Team Lead, you’ll gain a comprehensive understanding of the infrastructure that powers our ML training pipelines.

The best candidate for this role will be a versatile, creative engineer with proven experience in infrastructure management, cloud platform operations, and deployment automation. You’ll be a valued contributor as you learn how our sophisticated multi-degree-of-freedom robotic systems work, while collaborating with cross-functional teams, including ML, Platform, Product Design, and Hardware and Sensor teams, to build and operate our data collection and ML model training pipelines.

Our Success Criteria

  • Build and support a secure, extensive, scalable, repeatable, and high-performing data collection platform
  • Set up and manage the necessary infrastructure for machine learning workloads, including cloud services, containers, and orchestration tools
  • Evaluate and deploy new tools and processes to optimize the effectiveness of our data collection and ML research activities
  • Monitor training cluster performance, and troubleshoot hardware and software errors, including docker and GPU driver issues
  • Build data collection and ML training pipelines, and support our researchers in containerizing ML workloads

Your Experience

Qualifications

  • Bachelor's degree or higher in Computer Science or related field
  • 3+ years experience in with Docker, Kubernetes, and at least one of AWS, GCP, or Azure cloud services
  • 2+ years of experience with ML frameworks, platforms and tools
  • Knowledge of professional engineering practices for the full product life cycle, including coding standards, code reviews, source management, agile, processes, testing, and operations
  • Demonstrated ability to design, implement, and test in a fast-paced environment

Skills

  • Demonstrated proficiency with Python for data and ML pipeline development
  • Demonstrated proficiency with Linux, Docker, and Kubernetes
  • Demonstrated proficiency with Observability platforms such as Splunk, Datadog, ELK Stack, and Prometheus/Grafana
  • Demonstrated familiarity with MLOps and machine learning frameworks such as PyTorch and TensorFlow
  • A passion for deployment automation practices, such as GitOps and CI/CD

Traits

  • Above all else, a consistently positive attitude and a willingness to do whatever it takes to create robust solutions to complex problems
  • Optimistic listening and conflict resolution capabilities
  • Advanced verbal and written communication and interpersonal skills
  • Self-motivated and able to solve problems independently
  • Demonstrated ability to influence without authority and create a sense of urgency
  • Obsession with bringing human-like intelligence to machines

Working at Sanctuary AI

Sanctuary AI is an equal opportunity employer; employment with Sanctuary AI is governed based on skills, competence, and qualifications and will not be influenced in any way by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability, or any other legally protected status. In 2023, Sanctuary AI moved into a state-of-the-art office facility and has been recognized by LinkedIn as a Top Startup company.

Benefits

Full time (non co-op) employees enjoy medical/dental/vision coverage, life insurance, wellness programs, stock options, paid time off (3 weeks vacation accrued annually, paid statutory holidays, paid statutory sick leave, and statutory parental leave), scheduling and worksite flexibility by role, and more.

About Sanctuary AI

Founded in 2018 by Geordie Rose, Olivia Norton, and Ajay Agrawal, Sanctuary AI is a Vancouver, Canada-based company. Sanctuary AI is on a mission to create the world’s first human-like intelligence in general-purpose robots that will help us work more safely, efficiently, and sustainably. And in the not-too-distant future, help us explore, settle, and prosper in outer space.

Company Info.

Sanctuary AI

Sanctuary is on a mission to create the world’s first human-like intelligence in general-purpose robots that will help us work more safely, efficiently, and sustainably. And in the not-too-distant future, help us explore, settle, and prosper in outer space.

  • Industry
    Artificial intelligence,Computer software
  • No. of Employees
    120
  • Location
    Vancouver, BC, Canada
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Sanctuary AI is currently hiring Machine Learning Operations Engineer Jobs in Vancouver, BC, Canada with average base salary of Can$91,000 - Can$194,000 / Year.

Similar Jobs View More