Sr Cloud Engineer - SRE

Revionics
Apply Now

Job Description

We’re looking for a passionate and talented teammate to help lead us scale and accelerate infrastructure and software deployment at Revionics.If you’re passionate about cloud-native technologies and infrastructure-as-code, we would love to hear from you. We are focused on implementing advanced cloud-native technologies and practices while driving the continuous delivery posture of the organization. 

Our ideal candidate is a self-starter and has excellent communication skills. Our collaborative environment relies heavily on innovation, technical savvy, and problem-solving skills. This is a full-time position in-office at our Bangalore, India location. As a Senior Site Reliability Engineer, you’ll be a major contributor to the company’s success. You’ll work with teams across the organization to build performant, reliable and highly scalable software systems. Your technical leadership will help drive continuous integration & delivery for our market leading AI Saas Products for the retail industry. Our Next-Gen Infrastructure stack is based on GCP, Linux, Windows, Terraform, Kubernetes, and Gitlab.

Required Skills:

  • Passion for reliable, scalable, observable software with strong sense of ownership.
  • 6 + years’ experience developing and monitoring mission-critical systems.
  • Understanding of and ability to drive site reliability engineering concepts and practices such as SLOs, SLIs, error budgets, and their practical application in cloud environments.
  • Hands on experience with Docker and Kubernetes preferably on Google Cloud.
  • Proficiency working with and understanding a containerized development workflow
  • Strong background in Linux/UNIX administration (e.g., RedHat/CentOS 7/Alpine Linux).
  • Strong background in Windows administration and troubleshooting (Windows 2019+).
  • Experience in Collaborating with engineering and operations teams to architect scalable solutions, conduct capacity planning, and optimize resource utilization on GCP.
  • Experience in leading incident management, root cause analysis, and resolution efforts for critical incidents affecting GCP services, ensuring swift resolution and minimal impact on operations using automation.
  • Expert in Infrastructure as Code (IaC) tools like Packer and Terraform.
  • Experience with configuration management tools like Puppet or Ansible.
  • Experience in deploying large scale Docker based environments with Kubernetes, OpenShift, or similar product.
  • Experience with languages like Bash, Python, or Go.
  • Experience with Kubernetes networking components (e.g., CNI plugins, Service Mesh technologies like Istio, etc.
  • Experience implementing Application clustering / load balancing concepts and technologies.
  • Proficiency with networking fundamentals, diagnostic, troubleshooting, etc.
  • Proficient in using command line tools to quickly triage and fix production issues.
  • Understanding of protocols/technologies like HTTP, SSL, LDAP, SQL, HTML, XML

Responsibilities:

  • Lead best practices for building and operating highly reliable systems.
  • Lead best practices for high availability, reliability, and performance of systems and services by implementing and adhering to reliability engineering practices.
  • Lead automation efforts in monitoring and observability and to reduce cloud spend.
  • Lead efforts to implement robust monitoring systems, define key metrics and alerts, and enhance continuous system health monitoring to proactively identify and address potential issues.
  • Identify bottlenecks, analyse system performance, and optimize configurations to enhance efficiency and performance of systems and services using automation.
  • Create and maintain documentation, best practices, runbooks, and share knowledge with the team to ensure a consistent understanding of systems and processes.

We offer a competitive total rewards package including a base salary determined based on the role, experience, skill set, and location. For those in eligible roles, discretionary incentive compensation may be awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility.

We are an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. By submitting an application for this job, you acknowledge that any personal data or personally identifiable information that you provide to us will be processed in accordance with our Candidate Privacy Notice.

Company Info.

Revionics

Revionics provides enterprise retailers around the world with leading, science-based solutions for pricing, promotions, markdowns and competitive insights to illuminate their way on the lifecycle pricing optimization journey. As a trusted partner for top retailers across a variety of industries and markets, Revionics delivers unparalleled results in ROI, profit lift, process efficiencies and more.

Get Similar Jobs In Your Inbox

Revionics is currently hiring Site Reliability Engineer Jobs in Bengaluru, Karnataka, India with average base salary of ₹90,000 - ₹250,000 / Month.

Similar Jobs View More