Solutions Architect, Large Language Model Inference

NVIDIA
Apply Now

Job Description

NVIDIA’s Worldwide Field Operations (WWFO) team is looking for a AI focused Solution Architect with expertise in Machine Learning (ML), Deep Learning (DL) and Data Science platforms. In particular, a candidate with understanding of neural Natural Language Processing (NLP), transformer architectures and Large Language Model (LLM) workflows. In this role you would be focusing to significant extent on inferencing technology (e.g. model compression, model compilation, model serving).

In our Solutions Architecture team, we work with the most exciting computing hardware and software, driving the latest breakthroughs in artificial intelligence. We need individuals who can enable customer adoption of NVIDIA technology and develop lasting relationships with our technology partners, making NVIDIA an integral part of end-user solutions. We are looking for someone always thinks about artificial intelligence, someone who can thrive in a fast paced, rapidly developing field, someone able to coordinate efforts between customers, corporate marketing, industry business development and engineering.

A successful candidate will be working with ground breaking NLP, LLM models that are fundamentally changing the way people use technology. As a Solutions Architect, you will be the first line of technical expertise between NVIDIA and our customers. Your duties will vary from working on proof-of-concept demonstrations, to driving relationships with key executives and managers in order to promote adoption of Large Language Models and streamline their deployment to production. Dynamically engaging with developers, scientific researchers, data scientists, IT managers and senior leaders is a significant part of the Solutions Architect role and will give you experience with a range of partners and technologies.

What You’ll Be Doing:

  • Work directly with key customers to understand their technology and provide the best solutions
  • Develop and demonstrate solutions based on NVIDIA’s and open source NLP and LLM technology
  • Perform in-depth analysis and optimisation to ensure the best performance on GPU based systems. This includes both training and inference NLP/LLM pipelines.
  • Partner with Engineering, Product and Sales teams to develop, plan best suitable solutions for customers. Enable development and growth of product features through customer feedback and proof-of-concept evaluations
  • Build industry expertise and become a contributor in integrating NVIDIA technology into Enterprise Computing architectures.
  • Work closely with customer's data science and IT teams

What We Need to See:

  • Excellent verbal, written communication, and technical presentation skills in English
  • MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields
  • 5+ years work or research experience with Python/ C++ / other software development and Capable of working in a constantly evolving environment without losing focus.
  • A consistent record of academic and/or industry experience in fields related to machine learning, deep learning and/or data science.
  • Work experience and knowledge of modern NLP including good understanding of transformer architectures including prompt learning and adapter tunning techniques (e.g. IA3 or LORA). Understanding of model alignment approaches.
  • Understanding of key libraries used for NLP/LLM training (NeMo Framework, DeepSpeed etc.) and inference (e.g. TRT-LLM, Triton Inference Server, HF Optimum).
  • You are excited to work with multiple levels and teams across organisations (Engineering, Product, Sales and Marketing team)
  • Ability to multitask in a fast-paced environment and Driven with strong analytical and problem-solving skills.
  • Strong time-management and organisation skills for coordinating multiple initiatives, priorities and implementations of new technology and products into very sophisticated projects
  • You are a self-starter with demeanour for growth, passion for continuous learning and sharing findings across the team

Ways to Stand Out from The Crowd:

  • Experience working with larger transformer-based architectures for NLP, CV, ASR or other.
  • Prior experience applying NLP technology and its deployment to production.
  • Knowledge using DevOps technologies such as Docker, Kubernetes, Singularity, etc.
  • Experience running large scale distributed DL training.
  • Understanding of HPC systems: data center design, high speed interconnect InfiniBand, Cluster Storage and Scheduling related design and/or management experience.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Company Info.

NVIDIA

NVIDIA’s invention of the GPU sparked the PC gaming market. The company’s pioneering work in accelerated computing—a supercharged form of computing at the intersection of computer graphics, high performance computing and AI—is reshaping trillion-dollar industries, such as transportation, healthcare and manufacturing, and fueling the growth of many others.

  • Industry
    Cloud computing,Video games,Computer software,Semiconductors,Computer hardware,Consumer electronics,Artificial intelligence
  • No. of Employees
    22,473
  • Location
    2701 San Tomas Expressway, Santa Clara, CA 95050, USA
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

NVIDIA is currently hiring Large Language Model Architect Jobs in United Kingdom with average base salary of £67,000 - £97,000 / Year.

Similar Jobs View More