HPC Systems Developer

Vector Institute
Apply Now

Job Description

The Vector Institute is seeking an HPC Systems Developer to join our growing team in Toronto as we continue the work of making Canada a centre of expertise for AI in the world.

The incumbent in this role will participate in the building and maintenance of High-Performance Computing environments for world class research in Machine Learning.

The HPC Systems Developer will take responsibility in managing performance metrics, systems reporting, custom development tasks, and work with teams within and outside Vector to provide insight into delivering the best computational experience for our researchers.

We are seeking a highly motivated Systems Developer with a hands-on, problem-solving approach to managing and troubleshooting high-tech environments.

KEY RESPONSIBILITIES

  • Design, develop, and maintain critical complex software solutions in support of HPC infrastructure monitoring;
  • Maintain up to date reporting of HPC resource utilization;
  • Build tools to assist in automation of infrastructure management;
  • Collaborate with partner organizations and initiatives to develop cross-organizational software solutions;
  • Document every facet of software solutions as a reference for further maintenance and upgrades;
  • Provide input for preventing future problems as well as incorporating solutions to current concerns; and,
  • Experiment with new technologies relevant to the area of development; recommend improvements to techniques, procedures or other aspects of technical development.

KEY SUCCESS MEASURES

  • Ensures the smooth functioning of the research systems, by undertaking detailed reporting of HPC operations, job statistics and performance metrics.
  • Researchers and the enterprise operations feel supported in all other computing needs.
  • Builds and maintains tools to inform the evolution of our environment, such as reporting systems, websites and metrics summarizations.

PROFILE OF THE IDEAL CANDIDATE

  • Degree or diploma in computer science or engineering or equivalency through more than five years systems programming in a UNIX/Linux environment or complex computing environment;
  • Advanced working knowledge with mainstream programming languages (python, javascript, java, c/c++, etc) and LAMP stack;
  • Experience with production software development and maintenance, including Git/GitHub workflows and continuous integration systems;
  • More than five years of proven, hands-on experience: Linux/UNIX systems programming preferably in a research environment; (Ubuntu, RedHat, CentOS);
  • Experience maintaining application tools and databases, MySQL, PostgreSQL;
  • Proven programming/scripting skills as it pertains to infrastructure management and monitoring;
  • Managing and troubleshooting software and code using mostly open-source software
  • Demonstrated ability to learn quickly;
  • Demonstrated ability to prioritize tasks and resolve problems in a timely manner;
  • Ability to work autonomously, multi-task and work in a fast-paced and stressful environment;
  • Strong attention to detail;
  • Ability to write technical documentation; and,
  • Excellent verbal and written communication skills.

Qualifications and Experiences below are considered an asset:

  • Hands-on experience using an HPC workload management systems such as, Slurm, SGE, Moab/Torque or equivalent scheduler
  • Good understanding of file systems such as ZFS and GPFS
  • Expertise in analyzing and reporting on large and volatile data sets

The Vector Institute supports a flexible work environment. This position requires a reasonable degree of on-site work which will vary depending on the project commitments and organizational activities. As a result, the ability to work remotely will fluctuate and will be determined in accordance with current business needs

At the Vector Institute we are committed to driving excellence and leadership in Canada’s knowledge, creation, and use of AI to foster economic growth and improve the lives of Canadians. We strive for greater inclusion in the programs and culture that we build by welcoming and encouraging applications from all qualified candidates. This includes but is not limited to applicants who are indigenous, 2SLGBTQIA+, racialized persons/visible minorities, women, and people with disabilities.

If you require an accommodation at any point throughout the recruitment and selection process, please contact hr@vectorinstitute.ai and we will happily work with you to meet your needs.

Company Info.

Vector Institute

The Vector Institute stands as an autonomous, non-profit entity committed to pioneering research within the realm of artificial intelligence (AI), particularly specializing in machine and deep learning. Collaborating with a wide spectrum of partners including institutions, industries, startups, as well as incubators and accelerators, we strive to propel AI research forward and foster its widespread application, adoption, and commercialization thr

  • Industry
    Artificial intelligence,Computer software
  • No. of Employees
    436
  • Location
    Toronto, ON, Canada
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Vector Institute is currently hiring High Performance Computing Engineer Jobs in Toronto, ON, Canada with average base salary of Can$91,000 - Can$194,000 / Year.

Similar Jobs View More