Job Description
The Senior Systems Test Engineer will play a critical role in ensuring the quality and reliability of our multimodal AI inferencing systems, designed for deployment in data centers and enterprise networks. This individual will be responsible for designing test strategies, automating tests, and evaluating system performance to validate functionality, scalability, and robustness. The ideal candidate will have hands-on experience in test automation for data-intensive hardware products, platform testing, proficiency in CI/CD pipelines, hardware-in-the-loop testing, troubleshooting deployed products, network and RDMA performance evaluation, familiarity with test automation suites, and expertise in scripting languages like Python.
Key Responsibilities:
Test Strategy and Automation:
- Design comprehensive test strategies and plans to validate the functionality, performance, and scalability of multimodal AI inferencing system.
- Develop and implement automated test suites using industry-standard frameworks such as Google Test or Pytest, ensuring thorough coverage of system functionalities.
- Integrate automated tests into CI/CD pipelines to enable continuous testing and deployment workflows.
Hardware Testing and Validation:
- Utilize hardware-in-the-loop testing methodologies to validate the interaction between software and hardware components in AI inferencing systems.
- Conduct hands-on testing of data-intensive hardware products, including AI accelerators, to identify defects and performance bottlenecks.
Platform and System testing and Validation:
- Design test strategies and test cases including stress tests to monitor thermal health at system and platform components level in typical deployment scenarios.
- Design Test suites and conduct tests to validate the chassis management features.
System Performance Evaluation:
- Evaluate network and RDMA performance of AI inferencing systems, identifying optimization opportunities and potential areas for improvement.
- Analyze system metrics and performance data to assess scalability, reliability, and efficiency under different workload scenarios.
Issue Troubleshooting and Debugging:
- Investigate and troubleshoot issues reported for products deployed in data centers and enterprise networks, utilizing logs, monitoring tools, and diagnostic techniques.
- Collaborate with cross-functional teams to resolve issues and implement corrective actions in a timely manner.
Documentation and Reporting:
- Document test plans, procedures, and results in a clear and concise manner, ensuring traceability and repeatability of test activities.
- Generate detailed reports and presentations to communicate test findings, performance metrics, and recommendations to stakeholders.
Qualifications:
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field.
- 5+ years of experience in systems test engineering roles, with a focus on hardware and software testing for data-intensive products.
- Proficiency in designing and implementing test automation frameworks using Google Test, Pytest, or similar tools.
- Hands-on experience with CI/CD pipelines and continuous testing practices.
- Experience with hardware-in-the-loop test.
- Proficiency in scripting languages such as Python for test automation and data analysis.
- Strong troubleshooting and debugging skills, with the ability to diagnose complex issues in deployed systems.
- Knowledge of network architecture and protocols, configuration with experience evaluating network performance.
- Excellent communication skills and ability to collaborate effectively in a fast-paced, team-oriented environment.
- Experience in collaborating with stake holders in software and hardware teams and coming up with detailed documents for test plans.
Recogni is an equal opportunity employer. We believe that a diverse team is better at tackling complex problems and coming up with innovative solutions. All qualified applicants will receive consideration for employment without regard to age, color, gender identity or expression, marital status, national origin, disability, protected veteran status, race, religion, pregnancy, sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.
Company Info.
Recogni
The automobile industry has arrived at a crossroads. The transition to electric vehicles (EV) and the vitalized development of fully-autonomous vehicles (AV) has placed a big burden on fitting extraordinary amounts of computational power for artificial intelligence within the energy budget of batteries without affecting range. While battery technology is improving slowly, advances in compute efficiency have stalled as mere Moore's Law scaling of