Graph analytics, KYC Screening, Natural Language Processing (NLP), Neural Networks, Object-Oriented programming (OO languages), Python Programming, Snorkel
Deep Discovery is hiring a Data Scientist with domain experience in Anti Money Laundering (AML) or Know Your Customer (KYC) to help us build and understand a 1.5 billion node business graph of legitimate and illegitimate business.
This graph spans the globe and incorporates text and structured data to look at relationships between people, companies and organizations using open sources of information like news on the web. This business graph drives our network-centric risk scoring models for the customers of global financial institutions. This is called a Know Your Customer (KYC) system for Anti Money Laundering (AML) and banks use these systems to evaluate the risk of doing business with their clients so they don’t face stiff fines from regulatory agencies. We are giving away free access to journalists as part of our social mission to enable leading investigative journalists and anti-corruption NGOs around the world.
We are taking a network-centric approach to KYC that evaluates clients in terms of the context in which they do business this involves several machine learning tasks: extracting knowledge graphs from news and other text, entity and identity resolution of the networks we collect about the economy, representation learning on the resulting graphs and their associated documents, building a scoring engine that uses our business graph to create an accurate risk score. Users do not believe predictions without explanations and the cost of errors is high, so the final machine learning component is the most critical: the system must be explainable in terms of the graphs from which we draw conclusions, and we use a graph database and network visualizations to explain our risk scores.
We’re looking for a self-motivated data scientist to join our team to be the glue that joins the rest of the team together: four ML Engineers, a Data Engineer and a Visualization Engineer. You will work on queries to understand identity resolution problems, turning these into rules for weakly supervised programmatic data labeling via a process known as weak supervision. You will do data analysis of structured, semi-structured, unstructured and graph datasets. You will build APIs for web applications and dashboards with our Visualization Engineer. You will profile datasets and drive the analysis that informs other engineering. You will perform light data engineering working with our Data Engineer. You will sit in the middle of the team and route information to where it needs to go. You will write the specifications for the interfaces for what we build.
While being published is good, the most important thing we want in a candidate is a track record of shipping products to real customers. We have data engineers but expect you to be fairly self-supporting in carrying out your work, so generalist skills are important. Candidates without advanced degrees are welcome, experience is education.
The ideal candidate will have:
Deep Discovery is an artificial intelligence system that helps banks, regulators, law enforcement, and journalists address the Ultimate Beneficial Ownership (“UBO”) problem in anti-money laundering, countering financial terrorism, and other efforts to fight financial crime. We apply advanced machine learning algorithms over a global knowledge graph of 500 million companies, directors, and officers to discover signals of financial crime within opa