Senior Machine Learning (ML) Data Engineer

Pfizer Inc.
Apply Now

Job Description

The WRDM Machine Learning Research Hub is seeking experienced data engineers with a background in machine learning, software engineering, technical problem-solving skills, and experience in creating scalable data pipelines and infrastructure for training, validating, and deploying into production ML solutions for broad usage. The roles include supervision responsibility for a team working in this space.

Role Responsibilities

The successful candidate will help guide strategy and work with ML research scientists across WRDM to enable our proprietary data and external datasets to be leveraged for ML modeling. This will be accomplished by designing and implementing end-to-end data workflows for large-scale data ingestion, processing, tagging, and publishing, with an eye towards improving ML model performance over time.


  • Formal training in Computer Science, Statistics, Applied Mathematics, Chemistry, Physics, a life science discipline, related technical discipline, or relevant practical experience. 
  • Demonstrated ability for strategy development in the data engineering field. With the ability to lead an internal or external matrix team.
  • 6+ years experience programming experience in Python, Java, Scala, C++, or SQL. 
  • 6+ years experience in software design, development, and algorithm-related solutions for production-grade systems using machine learning. 
  • 6+ years experience in managing code composed of multi-developer teams, following industry best practices
  • Deep knowledge of one or more scientific data types (e.g. biomedical images, biomedical text, large-scale, multidimensional 'omics, large- or small- molecule therapeutics, clinical or Real World Data, etc.)
  • Candidate demonstrates a breadth of diverse leadership experiences and capabilities including: the ability to influence and collaborate with peers, develop and coach others, oversee and guide the work of other colleagues to achieve meaningful outcomes and create business impact.

Preferred Qualifications

  • MS/PhD + 4 years of relevant research experience 
  • Experience with high performance computing (HPC) environments (SLURM/LSF/SGE schedulers) 
  • Familiarity with cloud computing infrastructure including Amazon Web Services (AWS) and distributed computing libraries (e.g. Spark, Hive, Impala, Kafka, etc.) 
  • Experience with containerization and orchestration tools (e.g. Docker, Singularity, Airflow, Luigi, Kubernetes, etc) 
  • Experience with workflow languages (CWL, WDL, Nextflow, etc.) 
  • Experience with CI/CD and automation tools (Terraform, CloudFormation, Jenkins, Ansible, etc.) 
  • Passion and curiosity for data and proven ability to take ideas from prototype to production. 

Technologies We Use:

Python, Java, C++, Slurm-based on-premise compute clusters, Google Cloud Platform, AWS, Docker, Singularity, Kubernetes, Python (Numpy, Pandas, Dask, PyTorch, TensorFlow, sci-kit learn, RDKit, Weights and Biases etc.

Other Job Details:

  • Additional Location Information: Cambridge, MA; La Jolla, CA; and Groton, CT
  • Eligible for Relocation Package
  • Eligible for Employee Referral Bonus

Company Info.

Pfizer Inc.

Pfizer Inc. is an American multinational pharmaceutical and biotechnology corporation headquartered on 42nd Street in Manhattan, New York City. Pfizer develops and produces medicines and vaccines for immunology, oncology, cardiology, endocrinology, and neurology. The company has several blockbuster drugs or products that each generate more than US$1 billion in annual revenues.

  • Industry
  • No. of Employees
  • Location
    235 East 42nd Street, New York, New York, USA
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Pfizer Inc. is currently hiring Senior Machine Learning Data Engineer Jobs in La Jolla, San Diego, CA, USA with average base salary of $160,000 - $240,000 / Year.

Similar Jobs View More