Post-Doctoral Position: Multi-modal Pretrained Language Models for Health Care

Computer Science Research Institute of Toulouse- IRIT
Apply Now

Job Description

Post-Doctoral Position:

Multi-modal Pretrained Language Models for Health Care

Supervision: José G. Moreno (jose.moreno@irit.fr) and Lynda Tamine- Lechani (lynda.lechani@irit.)

  • Location: IRIT lab., Toulouse, France with expected visit to University of Laval (Canada)
  • Deadline for applications : 31st January 2023 (or until the position is filled)
  • Start date: as soon as possible
  • Duration: 16 months
  • Keywords: Deep learning, Pretrained Language Models (PLMs), Graphs, Texts
  • Salary: the candidate will be paid about 2750 € euros gross per month (higher salary is possible depending on experience)

Background: the IN-uTERO project

This post-doctoral position is part of the French-Canadian IN-UTERO project on the development of deep learning and federated learning models in the context of patient representation learning and health risk prediction.

The overall societal goal of the IN-uTERO project is to leverage already available, funded, and accessible national and international cohorts built with big health data to efficiently automate teratogen and patient risk profile signal detections, and actively prevent prescribing of harmful drugs during pregnancy. Indeed, the determinants of congenital abnormalities and other side effects, dramatic for the child and his entourage, are multiple and the overall consideration of all these factors is complex. The innovative approaches we would like to apply in the IN-uTERO project, based on deep learning and federated learning eliciting the use of several data sources, can today make it possible to consider a large number of these concomitant factors. This will result in improved delivery of safe and effective medicine and health care, decreasing inadvertent harm and costs to society.

For this aime, two Mother-Child health Cohorts will be used:

Quebec Pregnancy Cohort (QPC) (www.medicationpregnancy.org) from CHU Ste Justine in Montreal which includes data delivered from 800,000 pregnancies and their children with up to 23 years of follow-up (1998- 2022), and EFEMERIS (French pregnancy/child cohort) (www.efemeris.fr ) from CHU Toulouse in France, which includes over 169 000 mother-child pairs from 1 July 2004 until 31 December 2020. Multi-source and heterogenous data are included such as administrative (birth, genre) and medical data (prenatal diagnosis along the pregnancy, occurrences of minor or major malformations, medical prescriptions etc.). Data privacy being at the heart of the project, IN-uTERO proposal requires suited learning strategies.

Scientific purpose

The scientific outcomes of the In-UTERO project are expected to be in line of recent advances dealing with the design of new pre-training objectives and model architectures for Pretrained Language Models (PLMs), such as as BERT (Delvin et al, 2019) or task-oriented generative versions (Ouyang et al, 2022). Specifically, the main scientific challenges that will be addressed in the In-UTERO project relates to multimodal based PLMs.

Developing PLMs based on multi-source, heterogeneous data and external knowledge is a promising research direction (Xu et al., 2023). Multi-source and Heterogeneous data mostly involve multiple modalities (e.g., text and image, text and video) and multiple data structures (e.g., structured data such as tables and graphs and unstructured such as texts).

Multimodal learning with transformers has been recently studied, involving a wide range of modalities, including vision and language (VisualBERT Li and al. 2019), video and language (eg. VideoBERT Sun et al., 2019).

However, combining structured and unstructured data, particularly graphs and texts is still understudied. In a recent work GraphCodeBERT (Guo et al., 2021) proposed a model for learning representations of program source codes, where they jointly embed abstract syntax trees and the textual comments of these codes. In the medical domain, MedGTx (Choi et al., 2022) is a Graph-based representation of the same patient admission (one source) using graph-based and textual-based inputs. In this line of work, the aim of the postdoctoral work is to investigate the design of new pretraining objectives for transformer architectures to jointly model cross temporal interactions between graphs and texts grounded on multi-source data (e.g., pairs of EHRs). A pre-training method with specific masking techniques, alignment with external knowledge, or data augmentation oriented objectives may be an initial target. However, considering both, local and global, contextual information on multiple modalities held by other sources either at the input (early), intermediate (middle representation) or prediction (late) may be an alternative. At a late stage, in collaboration with project partners, we plan to increase model understanding by experts which inherently helps assess if and when to trust model predictions when making decisions. We will particularly investigate the use of post-hoc and/or argumentative explanations through model-agnostic methods (Jekumar et al, 2020).

Company Info.

Computer Science Research Institute of Toulouse- IRIT

The Computer Science Research Institute of Toulouse (IRIT), one of the largest Mixed Research Units (UMR5505) at the national level, is one of the pillars of research in Midi-Pyrénées with its 600 permanent and non-permanent members. permanent staff and a hundred external collaborators.

  • Industry
    Education
  • No. of Employees
    290
  • Location
    Toulouse, France
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Computer Science Research Institute of Toulouse- IRIT is currently hiring Postdoctoral Researcher, Deep Learning Jobs in Toulouse, France with average base salary of €2,750 - €4,000 / Month.

Similar Jobs View More