Senior Site Reliability Engineer

Zoom Video Communications
Apply Now

Job Description

Zoom is an award-winning workplace. We have been recognized by Comparably as #1 CEO, Company Happiness, Benefits, Compensation, Diversity, and more! Not to mention we’ve been awarded by Glassdoor as the 2nd Best US workplace & Best Large Company US CEO in 2018, Wealthfront, and Business Insider. Our culture focuses on delivering happiness, our commitment to transparency, and the tangible benefits we provide our employees and our customers.

Senior Site Reliability Engineer

The ideal candidate will have experience and qualifications for planning and managing operations infrastructure, including:

  • Experience planning and executing site deployments (AWS, RackSpace, private cloud).
  • Expertise automating system administration tasks with scripting tools (Python or shell preferred).
  • Aptitude for analyzing and troubleshooting operating system, networking, configuration and performance problems.
  • Fundamental understanding of Internet networking protocols: TCP/IP, TLS, DNS, HTTP, SMTP.
  • Ability to install, configure and maintain Linux hosts and popular open source applications such as Nginx, Apache HTTPd, Apache Tomcat, Postfix, and MySQL server.
  • Experience with monitoring and automation tools such as Ansible, Splunk, Zabbix, etc.
  • Ability to communicate clearly with both technical and non-technical staff.
  • Familiar with system hardening and server security best practices.

The position will have the following responsibilities:

  • Responsible for responding to alerts escalated from the Network Operations Center(NOC) and maintaining 99.99% availability. Utilize open source and proprietary tools to monitor and actas a tier 3 level support for production issues.
    Provide 24x7 on-call support to restore services in case of an outage. Involved in the maintenance and expansion of the production infrastructure.
  • Responsible for capacity planning, purchasing hardware, configuring new services, writing scripts for automation of tasks, and writing Ansible playbooks for infrastructure automation.
  • Involved with patch management and remediating security vulnerabilities. Keep up to date on current industry vulnerabilities using patch management infrastructure that keep servers updated with the latest security vulnerabilities.
  • Work with the team to validate patches in the development stage before rolling to production. Maintain baseline images with latest security patches. Responsible for AWS infrastructure management and driving down AWS costs by implementing tools that provide details around low instance utilization, CPU, disk, network, account creation, instance creation, instance termination, resource allocation, architecting newservices, and building new security policies.
  • Build infrastructure for the future and microservice architecture in a (non-existent) Kubernetes environment. Migrate legacy applications from bare metal servers to containers to run in Kubernetes.
  • Utilize CI/CD skills to implement deployment pipeline for legacy applications to be deployable in development, stage, and production with minimal downtime.

Company Info.

Zoom Video Communications

Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Our easy, reliable cloud platform for video, phone, content sharing, and chat runs across mobile devices, desktops, telephones, and room systems.

  • Industry
    Media,Information Technology
  • No. of Employees
    5,251
  • Location
    55 Almaden Blvd, San Jose, CA 95113, USA
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Zoom Video Communications is currently hiring Senior Site Reliability Engineer Jobs in Portland, OR, USA with average base salary of $160,000 - $240,000 / Year.

Similar Jobs View More