Site Reliability Engineer (KR)

Gauss Labs
Apply Now

Job Description

Gauss Labs is seeking a highly skilled Site Reliability Engineer to join our team. As an SRE at Gauss Labs, you will play a critical role in ensuring our industrial AI platform's reliability, performance, and scalability. You will be responsible for building and maintaining a robust solution that supports our growing business at the customer site.

Responsibilities

  • Monitoring and Alerting: Creating and maintaining robust monitoring systems to proactively identify and resolve issues before they impact customers. Implementing effective alerting mechanisms to ensure timely response to critical events.
  • Incident Response: Participating in on-call rotations and leading incident response efforts to minimize downtime and restore service quickly.
  • Automation: Developing and implementing automation tools and scripts to streamline operations, reduce manual effort, and improve efficiency.
  • Capacity Planning: Forecasting resource needs, optimizing resource utilization, and ensuring the customers’ infrastructure can handle increasing workloads.
  • Performance Optimization: Identifying and resolving performance bottlenecks, optimizing system performance, and improving response times.
  • Collaboration: Partnering with software engineers, data scientists, and other teams to ensure alignment and efficient operations.
  • Customer Focus: Working closely with the AI program manager and Technical Account Manager to understand customer issues, provide technical support, and improve customer satisfaction.
  • Continuous Improvement: Driving a culture of continuous improvement by identifying opportunities to enhance system reliability, performance, and efficiency.

Basic Qualifications

  • Bachelor's degree in computer science, engineering, or a related discipline
  • 5+ years of industry experience as a Site Reliability Engineer
  • Experience with cloud platforms (e.g., AWS, GCP, Azure).
  • Experience with scripting languages (e.g., Python).
  • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana).
  • Experience in ticket management, issue resolution, and troubleshooting
  • Strong problem-solving and troubleshooting skills.
  • Ability to work independently and as part of a team.
  • Excellent customer communication and interpersonal skills.

Preferred Qualifications

  • Knowledge of containerization technologies (Docker, Kubernetes).
  • Knowledge of AI/ML infrastructure and workloads.
  • Knowledge of big data technologies (Hadoop, Spark).
  • Fluency in verbal and written English

Company Info.

Gauss Labs

We normalize AI. Gauss Labs aims to revolutionize manufacturing by building industrial AI systems beyond human capabilities. Founded in August 2020 with two international locations in San Jose, CA, and Seoul, Korea, Gauss Labs is home to Gaussians who are enthusiastic about pursuing this goal under balanced and inspiring leadership.

  • Industry
    Artificial intelligence
  • No. of Employees
    52
  • Location
    Palo Alto, CA, USA
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Gauss Labs is currently hiring Site Reliability Engineer Jobs in Yeoksam-dong, Gangnam-gu, Seoul, South Korea with average base salary of ₩850,000 - ₩1,000,000 / Month.

Similar Jobs View More