(Senior) Clinical ML Data Scientist

Insitro
Apply Now

Job Description

Key to insitro’s approach to rethinking drug development is leveraging disease models, genetics, and clinical datasets to link in vitro and cellular phenotypes with patient outcomes.

Multimodal clinical datasets are an essential component of modeling patient heterogeneity, disease progression, and phenotypic diversity. Our goal is to develop sophisticated models from patient clinical records, common lab biomarkers, and, when available, higher content multi-omic data to identify coherent patient segments to reveal novel genetic signals and opportunities for targeted therapies.

As a clinical machine learning data scientist, you will develop, productionize, and deploy cutting edge ML approaches to analyze and integrate large-scale multi-modal phenotypic datasets, including electronic health records, physiological monitoring, longitudinal clinical data, diverse biomarker data, and multi-omic modalities. You will work with clinical data from large human cohorts such as randomized clinical trials, electronic health records, national biobanks, and other sources. You will contribute to developing models understanding patient state and predicting outcomes and clinical endpoints for patient data. Via this collaborative effort, you will have the opportunity to contribute to developing models for understanding patient disease state and progression, predicting patient outcomes, and identifying therapeutic targets and developing drugs that have high efficacy and low toxicity.

In this role, your focus will be on developing end-to-end modeling capabilities for phenotypic clinical data. You will own the creation of scalable, reproducible pipelines that extract clinical data from diverse sources, normalize patient-level records into our standardized data schemas, develop novel feature extractors that exploit the underlying clinical dataset structure, and model architectures that generalize across heldout clinical trial sites and between datasets. The role will especially focus on developing multi-modal models, incorporating both longitudinal aspects of a patient's journey through various diagnostic ontologies, and progressively richer phenotypes as available in specific cohorts. You will work in collaboration with the software engineering team to ensure these pipelines are robust, reusable platform components that can be deployed on large-scale datasets in a portable way. 

You will be joining a vibrant biotech startup that has long-term stability due to significant funding, yet is in a high growth phase. A lot can change in this early and exciting phase, providing many opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients! This role is preferably based in San Francisco Bay Area or Boston, but we are open to discussing other locations in the United States and the UK.

About You

  • Ph.D. in biomedical informatics, machine learning, computer science, or a related discipline, or equivalent practical experience (e.g., a Masters degree plus 2 years in relevant industry experience);
  • Demonstrated ability to use cutting edge statistical and machine learning methods for analyzing clinical data;
  • Extensive hands on experience working with several of the following areas: electronic health records; clinical trial data; disease progression modeling; multi-omic phenotypes; and biomedical or biophysical imaging modalities
  • Demonstrated ability to rigorously identify and deal with confounders and complexities in human clinical data;
  • Experience using modern deep learning frameworks (PyTorch, Jax, XGBoost, etc);
  • Proficiency in Python and working with large-scale clinical data;
  • Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions;
  • Passion for making a difference in the world.

Nice to Have

  • Experience in probabilistic modeling and/or causal inference;
  • Experience working on decision making under uncertainty;
  • Experience working with EHR linked with genomic/molecular data;
  • Experience with genetic analyses (e.g., GWAS, rare variant analysis, etc.) and / or genomic data from different modalities (DNA sequencing, RNA-seq, proteomics, DNA accessibility assays, etc.);
  • Familiarity with cloud computing services (e.g., AWS or GCP) and workflow management tools or batch scheduling systems (e.g. SLURM);
  • Proficiency in Linux environment (including shell/Bash scripting), experience with database languages (e.g., SQL) and experience with version control practices and tools (e.g., Git)

Benefits at insitro

  • Excellent medical, dental, and vision coverage; insitro pays 100% of premiums for employees
  • Excellent mental health and well-being support
  • Open vacation policy
  • Access to free onsite baristas and cafe with daily lunch and breakfast
  • Access to free onsite fitness center
  • Commuter benefits
  • Paid parental leave
  • Competitive pay and 401(k) matching
  • Flexible work schedule (on site and remote)

Company Info.

Insitro

insitro is a data-driven drug discovery and development company using machine learning and data at scale to transform the way that drugs are discovered and developed for patients. insitro is developing predictive machine learning models to discover underlying biologic state based on human cohort data and in-house generated cellular data at scale. These predictive models can be brought to bear on key bottlenecks in pharmaceutical R&D.

  • Industry
    Biotechnology Research
  • No. of Employees
    207
  • Location
    South San Francisco, CA, USA
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Insitro is currently hiring Data Scientist, Machine Learning Jobs in South San Francisco, CA, USA with average base salary of $160,000 - $240,000 / Year.

Similar Jobs View More