Java Programming, Python Programming, C++, C Programming, SQL, Cloud computing, Scala Programming, Machine learning techniques, Data science techniques, MATLAB Programming, PyTorch, TensorFlow, R Programming
Key to insitro’s approach to rethinking drug development is leveraging disease models, genetics, and clinical datasets to link in vitro and cellular phenotypes with patient outcomes.
Multimodal clinical datasets are an essential component of modeling patient heterogeneity, disease progression, and phenotypic diversity. Our goal is to develop sophisticated models from patient clinical records, common lab biomarkers, and, when available, higher content multi-omic data to identify coherent patient segments to reveal novel genetic signals and opportunities for targeted therapies.
As a clinical machine learning data scientist, you will develop, productionize, and deploy cutting edge ML approaches to analyze and integrate large-scale multi-modal phenotypic datasets, including electronic health records, physiological monitoring, longitudinal clinical data, diverse biomarker data, and multi-omic modalities. You will work with clinical data from large human cohorts such as randomized clinical trials, electronic health records, national biobanks, and other sources. You will contribute to developing models understanding patient state and predicting outcomes and clinical endpoints for patient data. Via this collaborative effort, you will have the opportunity to contribute to developing models for understanding patient disease state and progression, predicting patient outcomes, and identifying therapeutic targets and developing drugs that have high efficacy and low toxicity.
In this role, your focus will be on developing end-to-end modeling capabilities for phenotypic clinical data. You will own the creation of scalable, reproducible pipelines that extract clinical data from diverse sources, normalize patient-level records into our standardized data schemas, develop novel feature extractors that exploit the underlying clinical dataset structure, and model architectures that generalize across heldout clinical trial sites and between datasets. The role will especially focus on developing multi-modal models, incorporating both longitudinal aspects of a patient's journey through various diagnostic ontologies, and progressively richer phenotypes as available in specific cohorts. You will work in collaboration with the software engineering team to ensure these pipelines are robust, reusable platform components that can be deployed on large-scale datasets in a portable way.
You will be joining a vibrant biotech startup that has long-term stability due to significant funding, yet is in a high growth phase. A lot can change in this early and exciting phase, providing many opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients! This role is preferably based in San Francisco Bay Area or Boston, but we are open to discussing other locations in the United States and the UK.
About You
Nice to Have
Benefits at insitro
insitro is a data-driven drug discovery and development company using machine learning and data at scale to transform the way that drugs are discovered and developed for patients. insitro is developing predictive machine learning models to discover underlying biologic state based on human cohort data and in-house generated cellular data at scale. These predictive models can be brought to bear on key bottlenecks in pharmaceutical R&D.
South San Francisco, CA, USA
2-4 year
South San Francisco, CA, USA
4-6 year
South San Francisco, CA, USA
2-4 year