Support Engineer - AI Server Systems

Tenstorrent Inc.
Apply Now

Job Description

Job Responsibilities

  • Maintenance, inspection and troubleshooting of AI servers and related systems (GPU clusters, storage, network equipment, etc.)
  • Primary isolation of server failures, on-site repairs, and parts replacement
  • Operational status monitoring and log analysis using NOC (Network Operations Center) and remote monitoring tools
  • Prepare and submit incident reports when problems occur
  • Supports firmware/BIOS/driver updates
  • Planning and performing routine inspections and preventative maintenance for customers.
  • Escalation support in collaboration with engineering and support departments
  • Manage maintenance inventory and coordinate parts delivery.
  • Supporting on-site implementation and supervising installation and relocation work

Required Skills and Experience

  • Experience maintaining x86 servers (especially GPU servers)
  • Ability to isolate hardware problems (power supply, memory, HDD, PCIe, GPU, etc.)
  • Experience operating in a Linux (Ubuntu / RHEL / CentOS) environment
  • Basic network knowledge (L2/L3, TCP/IP, DHCP, IPMI)
  • Experience in providing technical support at customer sites (on-site support)
  • Documentation skills for system maintenance and troubleshooting
  • Experience using hardware diagnostic tools (IPMItool, smartctl, nvidia-smi, etc.)
  • Experience reading English manuals and communicating with overseas support desks
  • Regular driver's license (for field trips)

Nice to Have

  • Experience with hardware such as NVIDIA GPU servers (e.g. DGX/HGX) and Supermicro/Inspur/Lambda
  • Knowledge of Enternet, InfiniBand, NVLink, PCIe switches
  • Experience in data center operation and maintenance
  • Fundamental knowledge of GPU-based deep learning and AI workloads
  • Experience with simple automation using Linux shell scripts

This offer of employment is contingent upon the applicant being eligible to access U.S. export-controlled technology. Due to U.S. export laws, including those codified in the U.S. Export Administration Regulations (EAR), the Company is required to ensure compliance with these laws when transferring technology to nationals of certain countries (such as EAR Country Groups D:1, E1, and E2). These requirements apply to persons located in the U.S. and all countries outside the U.S. As the position offered will have direct and/or indirect access to information, systems, or technologies subject to these laws, the offer may be contingent upon your citizenship/permanent residency status or ability to obtain prior license approval from the U.S. Commerce Department or applicable federal agency. If employment is not possible due to U.S. export laws, any offer of employment will be rescinded.

Company Info.

Tenstorrent Inc.

The Tenstorrent team combines technologists from different disciplines who come together with a shared passion for AI and a deep desire to build great products. We value collaboration, curiosity, and a commitment to solving hard problems.

Get Similar Jobs In Your Inbox

Tenstorrent Inc. is currently hiring Technical Support Engineer Jobs in Tokyo, Japan with average base salary of ¥5,000,000 - ¥6,500,000 / Year.

Similar Jobs View More