AI legal policy, Bayesian networks, Large data sets, Large scale data processing, Machine learning techniques
We’re building a team that will hunt and mitigate extreme risks from future models.
This team will red team models to test the most significant risks they might be capable of in critical areas like biosecurity, cybersecurity risks, or deception. We believe that clear risk demonstrations can significantly advance technical research and mitigations, as well as identify effective policy interventions to promote and incentivize safety. And if we figure out how to prevent the most serious risks, we unlock some of the most valuable applications of AI.
As part of this team, you will lead research to baseline current models and test whether future frontier capabilities could cause significant harm. Day-to-day, you may decide you need to finetune a model to see whether it becomes superhuman in an eval you’ve designed; whiteboard a threat model with a national security expert; test a new training procedure or how a model uses a tool; brief government, labs, and research teams. Our goal is to see the frontier before we get there.
By nature, this team will be an unusual combination of backgrounds. We are particularly looking for people with these kinds of backgrounds:
Science: For example, you’re a chemist who builds LLM agents to help your research. Or, you’ve built a protein language model and you enjoyed looking through the embedding space. You’re a team lead at an ML-for-drug discovery company. You’ve built software for astronauts or materials scientists.
Cybersecurity: You’re a white hat hacker who is curious about LLMs. You’re an academic who researches RL for cybersecurity. You’ve participated in CTFs and you want to automate one.
Alignment: You’ve written detailed, concrete scenarios of significant risk from AI in a way that can be tested. You have built model evals and have ideas about how they can be better.
Do not rule yourself out if you do not fit one of those categories. It’s very likely the people we’re looking for do not fit any of the above.
This team is a technical research team within the Policy team with strong collaboration with the broader Anthropic Research team.
About Anthropic
Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our customers and for society as a whole. Our interdisciplinary team has experience across ML, physics, policy, business and product.
You might be a good fit if you:
Responsibilities:
Annual Salary (USD)
Hybrid policy & US visa sponsorship: Currently, we expect all staff to be in our office at least 25% of the time. We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate; operations roles are especially difficult to support. But if we make you an offer, we will make every effort to get you into the United States, and we retain an immigration lawyer to help with this.
Role-specific policy: For this role, we prefer candidates who are able to be in our office more than 25% of the time, though we encourage you to apply even if you don’t think you will be able to do that.
We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.
Anthropic, a public-benefit corporation and AI startup based in the United States, was established by former members of OpenAI. The company's primary focus is on creating general AI systems and language models, while maintaining a philosophy of responsible AI use.
San Francisco, CA, USA
8-10 year
San Francisco, CA, USA
4-6 year
San Francisco, CA, USA
4-6 year
New York, NY, USA
4-6 year