Site Reliability Engineer - ML/AI Platform (Remote)

Cisco Meraki
Apply Now

Job Description

Cisco Meraki is revolutionizing the way IT administrators manage their infrastructure by providing simple and secure cloud-managed solutions. With a large install base of customers and rich multifaceted data sets, the potential for data analytics to improve business performance for both our customers and our own business is enormous.

About the role

The Data Science Infrastructure team is a growing group that works closely with executives and leaders across the company to support the development and alignment on our business strategy. We are looking for an experienced Site Reliability Engineer - AI/ML Platform to join our team at Cisco Meraki to design, deploy, secure, and maintain cloud-based AI/ML infrastructure. In this role, you will work with multi-functional teams such as Application Security, Site Reliability Engineering, Data Engineering, and Data Science for developing services and platform that drive AI/ML activities. This would be an outstanding fit for a solution-oriented technical pro who is hands-on, has the ability to work autonomously, and can drive technical efforts building robust and resilient auto-scaling platform solutions.

What Will You Do

  • Design, build, and maintain AI/ML infrastructure that supports every stage of ML workflow including data ingestion, model building, and model ops.
  • Influence architectural decisions with focus on security, scalability, and high-performance.
  • Collaborate with other engineers on the team to foster sound infrastructure engineering principles and represent our engineering values.
  • Work with Application security and SRE teams at Meraki to deploy and secure both applications and infrastructure.
  • Improve and restructure the backend architecture to scale to ever-larger customers to ensure a flawless user experience and high uptime.
  • Build end-to-end documentation and instrumentation of our platform to ensure visibility, automation, self-healing, and resiliency throughout the stack.

What Skills You Posses

  • BS or MS in Computer Science / related technical fields or equivalent combination of graduate degree and work experience
  • 5+ years of work experience in Site Reliability particularly working with cloud providers or large scale systems.
  • 5+ years of experience in scripting or coding using python or bash programming languages.
  • Experience in designing, deploying, and securing cloud-based AI/ML infrastructure. (AWS Preferred)
  • Experience in containerization technologies (docker), orchestration platform (K8s), and CI/CD framework (gitlab).
  • Experience in writing code to deploy and automate infrastructure.
  • Experience working with infrastructure as a code framework such as Terraform.
  • Experience supporting production systems to minimize customer downtime.
  • Strong written and verbal communication skills and excellent attention to detail and accuracy

Bonus Points For:

  • Experience maintaining Kubeflow or similar MLOps platform.
  • Experience or a desire to lead technical decisions and design discussions.
  • Experience or willing to work in an agile environment (Scrum, Kanban, etc.).

We encourage you to drop us a line even if you don’t have all the points above. That's a lot of different areas of responsibility! We will help you pick them up because we believe that great engineers come from diverse backgrounds.

Cisco Covid-19 Vaccination Policy

The health and safety of Cisco's employees, customers, and partners is a top priority. Our goal is to protect and mitigate the spread of COVID-19 infection for strong business resiliency during the pandemic. Therefore, Cisco may require new hires to be fully vaccinated against COVID-19 if the role requires business-related travel, meeting with customers/partners (including visiting third-party sites on behalf of Cisco), attending trade events, and Cisco office entry, unless otherwise prohibited by applicable law, and in countries where COVID-19 vaccination is legally required. The company will consider legally required accommodations/exceptions for medical, religious, and other reasons as per the requirements of the role and in accordance with applicable law. Additional information will be provided to candidates about the requirements and accommodation process at the offer time based on region.

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.

At Cisco Meraki, we’re challenging the status quo with the power of diversity, inclusion, and collaboration. When we connect different perspectives, we can imagine new possibilities, inspire innovation, and release the full potential of our people. We’re building an employee experience that includes appreciation, belonging, growth, and purpose for everyone.

Company Info.

Cisco Meraki

Cisco Meraki is a cloud-managed IT company headquartered in San Francisco, California. Their products include wireless, switching, security, enterprise mobility management and security cameras, all centrally managed from the web. Meraki was acquired by Cisco Systems in December 2012.

Get Similar Jobs In Your Inbox

Cisco Meraki is currently hiring Site Reliability Engineer Jobs in San Jose, CA, USA with average base salary of $120,000 - $190,000 / Year.

Similar Jobs View More