Data Science AI-CV-NLP, Summer Intern

Ancestry
Apply Now

Job Description

What you will do:

Ancestry is looking for an exceptional, passionate, and highly motivated Data Science Intern to join our Data Science AI team this summer. The Data Science AI-CV-NLP team develops generative AI, CV and NLP models to extract and organize text and image information from billions of historical and genealogical records. AI, CV, NLP, and LLM models are combined to extract and organize information from historical documents to help customers discover and connect with their family history. As a Data Science intern on the AI-CV-NLP team, you will build, train and fine-tune models that promote product development, customer success, and content creation across our Family History business. You will also work closely with engineering teams to train, optimize, and deploy models. 

  • Implement state of the art generative AI, NLP, LLM, CV solutions for NER, relation extraction, summarization, topic analysis, entity resolution, knowledge graphs, embeddings based information retrieval, story generation, AI driven chat, etc. across various genealogical and historical collections such as newspapers, city directories, family history books, birth, marriage and death records, etc. 
  • Analyze model performance, and explore zero-shot/few-shot label generation to augment or supersede iterating with manual labeling resources to curate and refine training sets to improve model performance
  • Collaborate with ML Ops and Data Science Engineers to deploy datasets, truthsets, models, pipelines, training and inference code to cloud based model registry 
  • Effectively communicate and present deliverables and solutions to teams, stakeholders, and executives 

Who You Are: 

  • Candidate for an advanced degree (MS/PhD) in Computer Science, Data Science, Statistics, Mathematics, Linguistics, Engineering or data related quantitative field
  • Specialization in generative AI, language models, computer vision, deep learning, machine learning, with software development expertise
  • Experience with applied research through understanding and implementing published models and methods for practical application to real-world problems
  • Strong proficiency in Python and related AI, LLM, CV, and/or NLP tools and libraries, and familiarity with deep learning frameworks like Pytorch, Hugging Face, OpenAI, TensorFlow, spaCy, SciPy stack and Scikit-learn

Nice to Have: 

  • Experience with LLMs, including training/fine-tuning, prompt engineering, RLHF, performance evaluation and cost analysis
  • Experience with NLP techniques such as named entity recognition, relationship extraction, document classification, document summarization, topic modeling, machine translation, sentiment analysis, dialogue systems
  • Experience in document image processing i.e., computer vision methods, image classification, object detection, segmentation, layout analysis, redaction, handwriting recognition
  • Familiarity with NLP technologies such as, NLTK, spaCy, pandas, numpy, along with understanding of pre-trained language models and architectures like BERT (and variants), GPT, T5, XLNet, PL Marker, TP Linker, OneRel, Hugging Face and OpenAI models, etc.
  • Familiarity with LLMs and GenAI models such as, LLaMA, Falcon, GPT*, BLIP, CLIP, etc.

Internship Program Details:

  • Students must be enrolled in an accredited U.S. educational institution with a graduation date after August 2024.
  • Summer 2024 program dates are May 13 – September 6 (Please note we will have three intern onboarding dates to choose from: May 13th, May 28th, and June 10th. Students may offboard every Friday, beginning August 9th. All internships must be wrapped up by September 6th).
  • FULLY PAID temporary housing and travel to and from the internship are provided.
  • All summer internships will be in Lehi, Utah. You will work a combined hybrid and office-based schedule that allows you to choose which days you come into the office and which days you work from temporary housing/home (Utah students).
  • Interns have the opportunity to network and partner with other interns and industry-leading professionals.
  • You will participate in engaging events, including executive speaker sessions, professional development, and our annual Intern Days to showcase your project and work.
  • You will be required to work a full-time schedule (40 hours/week), Monday-Friday.
  • Company-issued laptop and equipment will be provided for the duration of the internship program.
  • Our interns enjoy mentorship and experience challenging work while receiving a great compensation package, temporary housing, and having fun, captivating experiences—we have it all!

Additional Information:

Ancestry is an Equal Opportunity Employer that makes employment decisions without regard to race, color, religious creed, national origin, ancestry, sex, pregnancy, sexual orientation, gender, gender identity, gender expression, age, mental or physical disability, medical condition, military or veteran status, citizenship, marital status, genetic information, or any other characteristic protected by applicable law. In addition, Ancestry will provide reasonable accommodations for qualified individuals with disabilities.

All job offers are contingent on a background check screen that complies with applicable law. For San Francisco office candidates, pursuant to the San Francisco Fair Chance Ordinance, Ancestry will consider for employment qualified applicants with arrest and conviction records.

Company Info.

Ancestry

Ancestry.com LLC is an American genealogy company based in Lehi, Utah. The largest for-profit genealogy company in the world, it operates a network of genealogical, historical records, and related genetic genealogy websites.

Get Similar Jobs In Your Inbox

Ancestry is currently hiring Data Science Summer Internship Jobs in Lehi, Utah, USA with average base salary of $20 - $50 / Hour.

Similar Jobs View More