NVIDIA Senior Site Reliability Engineer Salary?

The average base salary for a Senior Site Reliability Engineer is NT$130,000 - NT$196,000 / Month at NVIDIA.

Senior Site Reliability Engineer Salary in Hsinchu City, Taiwan

The average base salary for a Senior Site Reliability Engineer is NT$130,000 - NT$196,000 / Month in Hsinchu City, Taiwan

Technical Skills Required to Become a Senior Site Reliability Engineer ?

To work as a Senior Site Reliability Engineer - You must have Degree in a relevant discipline in Degree in Artificial intelligence- AI Degree in Computer Science Degree in Machine Learning Degree in Mathematics Degree in Physics

https://www.karkidi.com/upload-nct/company-logo/th1_nvidia_30f02.png

Looking for a Senior Site Reliability Engineer in Hsinchu City, Taiwan?

NVIDIA is currently hiring Senior Site Reliability Engineer in Hsinchu City, Taiwan and looking for candidates have skills and work experience of 6-8 year.

Does NVIDIA hire Senior Site Reliability Engineer now?

NVIDIA seeks to hire qualified Senior Site Reliability Engineer with at least 6-8 year experience.

Posted on:22 Jan 2024 BACK TO SEARCH

Senior SRE Software Engineer, Storage and Data

NVIDIA

Apply Now

Job Type
Full Time
Experience
6-8 year
Salary
NT$130,000 - NT$196,000 / Month
Location

Hsinchu City, Taiwan
Job Function

Senior Site Reliability Engineer
Industry
Information Technology
Qualification

Degree in Artificial intelligence- AI
Degree in Computer Science
Degree in Machine Learning
Degree in Mathematics
Degree in Physics

Key Skills

Aartificial intelligence, Analytical and Problem solving, Ansible, Bash scripting, Docker, Java Programming, Kubernetes-K8s, Machine learning techniques, Python Programming, REST API, RESTful, Site Reliability Engineering (SRE), Swift

Job Description

SRE at NVIDIA ensures that our DGX Cloud platform continues to be reliable and performant to meet the needs of our users. You will play a critical role in ensuring the reliability, availability, and performance of storage infrastructures for NVIDIA DGX GPU cloud platforms. To collaborate with cross-functional teams to design, build, and maintain scalable and fault-tolerant storage solutions that support our mission-critical applications and services. Your expertise in storage systems and reliability engineering will be instrumental in minimizing downtime, improving system efficiency, and enhancing the overall user experience.

SRE is also a mindset and a set of engineering approaches to running efficient production systems, with a focus on eliminating manual work through modern automation practices and performance tuning. We promote self-direction to work on meaningful projects while striving to build an environment that provides the support and mentorship needed to learn and grow.

What You Will Be Doing:

Develop strategies to ensure the reliability and availability of storage systems, including redundancy, failover, and disaster recovery plans.

Continuously analyze and fine-tune storage systems for optimal performance, including throughput optimization, caching, and latency reduction. Identify and resolve performance bottlenecks to enhance overall system efficiency.

Develop and maintain automation scripts and tools to streamline storage provisioning, configuration, and maintenance tasks.

Implement monitoring and alerting systems to proactively identify and address issues.

Participate in on-call rotation to respond to storage-related incidents promptly conduct root cause analysis of outages and implement preventive measures.

Collaborate with cross-functional teams, including Compute SRE, development, and networking, to ensure seamless integration of large-scale storage solutions.

Work with AI/ML workloads to capture and correlate behavior in large clusters and workflows, which are otherwise hard to understand.

What We Need To See:

BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), with 5+ years equivalent practical experience.

Proven experience in storage system administration and site reliability engineering.

Experience with Git, RESTFul API, Linux service operation, networking, complexity analysis, AWS S3, software design, and maintaining large-scale Linux based systems.

Experience in one or more of the following languages: Ansible, Bash, Python, Go, YAML, Java

Good knowledge of infrastructure configuration management tools like Ansible, Chef, Puppet, and Terraform.

Experience in using observability and tracing-related tools like InfluxDB, Prometheus, and Elastic(OpenSearch) stack, Grafana.

Ways to stand out from the crowd:

Experience with storage solutions like: OpenStack Swift(object), AWS S3(object), DDN, Lustre.

Strong Linux and network troubleshooting skills by running various commands and tools.

Demonstrated experience in having an SRE mindset, customer-first approach, and focus on customer satisfaction and passion for ensuring customer success..

Interest in crafting, analyzing, and fixing large-scale distributed systems. Strong debugging skills with a systematic problem-solving approach to identify complex problems.

Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack, and Docker.

Company Info.

NVIDIA

NVIDIA’s invention of the GPU sparked the PC gaming market. The company’s pioneering work in accelerated computing—a supercharged form of computing at the intersection of computer graphics, high performance computing and AI—is reshaping trillion-dollar industries, such as transportation, healthcare and manufacturing, and fueling the growth of many others.

Industry

Cloud computing,Video games,Computer software,Semiconductors,Computer hardware,Consumer electronics,Artificial intelligence
No. of Employees

22,473
Location

2701 San Tomas Expressway, Santa Clara, CA 95050, USA
Website

https://www.nvidia.com/
Jobs Posted

Get Similar Jobs In Your Inbox

NVIDIA is currently hiring Senior Site Reliability Engineer Jobs in Hsinchu City, Taiwan with average base salary of NT$130,000 - NT$196,000 / Month.

Similar Jobs View More

Senior ML Engineer, Planning and Prediction – Autonomous Vehicles

NVIDIA

Shanghai, China

6-8 year

Aartificial intelligence,CUDA/GPU programming,Deep Learning,Design,Effective communication skills,Machine learning techniques,NeurIPS,Python Programming,PyTorch,TensorFlow,TensorRT

Senior ML Engineer, Planning and Prediction – Autonomous Vehicles

NVIDIA

Beijing, China

6-8 year

Aartificial intelligence,CUDA/GPU programming,Deep Learning,Design,Effective communication skills,Machine learning techniques,NeurIPS,Python Programming,PyTorch,TensorFlow,TensorRT

Software Engineering Manager, Test Engineering

NVIDIA

Beijing, China

8-10 year

Aartificial intelligence,AI Robotic systems,Docker,Jenkins,KPIs,Leadership Skill,Machine learning techniques

Software Engineering Manager, Test Engineering

NVIDIA

Shanghai, China

8-10 year

Aartificial intelligence,AI Robotic systems,Docker,Jenkins,KPIs,Leadership Skill,Machine learning techniques

Software Engineering Manager, Test Engineering

NVIDIA

Shenzhen, Guangdong Province, China

8-10 year

Aartificial intelligence,AI Robotic systems,Docker,Jenkins,KPIs,Leadership Skill,Machine learning techniques

Principal Software Engineer, Drive Context Fusion - Autonomous Vehicles

NVIDIA

Beijing, China

12-14 year

Aartificial intelligence,AI Robotic systems,Algorithms,API,C++,Computer Vision (CV),CUDA/GPU programming,Design,Effective communication skills,Leadership Skill,Optimization

Principal Software Engineer, Drive Context Fusion - Autonomous Vehicles

NVIDIA

Shanghai, China

12-14 year

Aartificial intelligence,AI Robotic systems,Algorithms,API,C++,Computer Vision (CV),CUDA/GPU programming,Design,Effective communication skills,Leadership Skill,Optimization

Principal Software Engineer, Drive Context Fusion - Autonomous Vehicles

NVIDIA

Shenzhen, Guangdong Province, China

12-14 year

Aartificial intelligence,AI Robotic systems,Algorithms,API,C++,Computer Vision (CV),CUDA/GPU programming,Design,Effective communication skills,Leadership Skill,Optimization

Software Engineer - Autonomous Vehicles

NVIDIA

Beijing, China

2-4 year

Aartificial intelligence,AI Robotic systems,Autonomous vehicles,Data science techniques,Data Visualization,Effective communication skills,ETL frameworks,KPI,Machine learning techniques,Python Programming,SQL,Teamwork

Software Engineer - Autonomous Vehicles

NVIDIA

Shanghai, China

2-4 year

Senior SRE Software Engineer, Storage and Data

Job Type

Experience

Salary

Location

Job Function

Industry

Qualification

Key Skills

Job Description

Company Info.

Get Similar Jobs In Your Inbox

NVIDIA is currently hiring Senior Site Reliability Engineer Jobs in Hsinchu City, Taiwan with average base salary of NT$130,000 - NT$196,000 / Month.

Similar Jobs View More