Job Description
The Opportunity:
In this role, you will be instrumental in designing, implementing, and enhancing our state-of-the-art testing and performance evaluation processes centered around our cutting-edge data center Gen AI inference product. You will play a critical role in ensuring the quality and reliability of our multimodal AI inferencing systems, designed for deployment in data centers and enterprise networks. Your expertise will shape our Hardware-in-the-Loop (HIL) testbeds and automated testing suites, guaranteeing quality and exceeding performance expectations for our AI solutions. You will have the opportunity to build and lead a high-performing team of system test engineers, fostering their growth and development within the organization.
What You'll Do:
- Lead the Recogni system test team focused on delivering the highest-quality and best-performing multimodal generative AI inference acceleration hardware and software systems.
- Work closely with multi-functional teams to understand requirements and then devise and execute test plans for both automated and manual hardware-in-the-loop (HIL) testing of new hardware and software releases; visualize results via test dashboards that convey regression, stability, KPIs, benchmark performance, power consumption, and other metrics.
- Drive new system setup & bring up while championing continuous quality improvement initiatives, reporting issues, and providing valuable feedback to engineering peers that improve product quality.
What You'll Bring:
- Bachelor's or Master's degree in Computer Science, Computer Engineering, Electronics, or a related field.
- 7+ years of proven experience designing and implementing post-silicon, system-level HIL testbeds, ideally with an established data center equipment supplier, hyperscaler, or cloud service provider.
- Deep understanding of automated testing frameworks and methodologies, with demonstrable proficiency in Python and bash scripting, communicating and controlling instruments, automatically driving measurement setups, performing data analysis, and extracting key performance parameters.
- Knowledge of Kubernetes, networking architecture, and protocols within data center environments, with exposure to server/network virtualization and container technologies.
- Hands-on experience with at least one popular Networking Operating Systems such as NX-OS, IOS, EOS, JunOS, Cumulus Linux, or SONiC.
- Aptitude for systematic documentation and reporting of test results.
- Excellent communication and collaboration abilities across different teams, with a willingness to occasionally travel and lead others.
Bonus Qualifications:
- Experience with DevOps tools and CI/CD pipelines.
- Knowledge of AI acceleration hardware.
- PyTorch, C++, and Rust expertise.
- Familiarity with current Gen AI model development and benchmarking practices.
Recogni's culture was built on the following values that are equally important to us as business:
- Put people first. We only succeed when our people succeed.
- Ethics and integrity always; Being open, honest, and respectful of everyone.
- Think Big. Be ambitious and have audacious goals.
- Aim for excellence. Quality and excellence count in everything we do.
- Own it and get it done. Results matter!
- Make Each Person Better together than they would be as an individual.
- Embrace each others’ differences.
- Embrace that there will be differences.
Recogni is an equal opportunity employer. We believe that a diverse team is better at tackling complex problems and coming up with innovative solutions. All qualified applicants will receive consideration for employment without regard to age, color, gender identity or expression, marital status, national origin, disability, protected veteran status, race, religion, pregnancy, sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.
Company Info.
Recogni
The automobile industry has arrived at a crossroads. The transition to electric vehicles (EV) and the vitalized development of fully-autonomous vehicles (AV) has placed a big burden on fitting extraordinary amounts of computational power for artificial intelligence within the energy budget of batteries without affecting range. While battery technology is improving slowly, advances in compute efficiency have stalled as mere Moore's Law scaling of