Research Scientist, Frontier Red Team

Anthropic
Apply Now

Job Description

We’re building a team that will hunt and mitigate extreme risks from future models.

This team will red team models to test the most significant risks they might be capable of in critical areas like biosecurity, cybersecurity risks, or deception. We believe that clear risk demonstrations can significantly advance technical research and mitigations, as well as identify effective policy interventions to promote and incentivize safety. And if we figure out how to prevent the most serious risks, we unlock some of the most valuable applications of AI.

As part of this team, you will lead research to baseline current models and test whether future frontier capabilities could cause significant harm. Day-to-day, you may decide you need to finetune a model to see whether it becomes superhuman in an eval you’ve designed; whiteboard a threat model with a national security expert; test a new training procedure or how a model uses a tool; brief government, labs, and research teams. Our goal is to see the frontier before we get there.

By nature, this team will be an unusual combination of backgrounds. We are particularly looking for people with these kinds of backgrounds:

Science: For example, you’re a chemist who builds LLM agents to help your research. Or, you’ve built a protein language model and you enjoyed looking through the embedding space. You’re a team lead at an ML-for-drug discovery company. You’ve built software for astronauts or materials scientists.

Cybersecurity: You’re a white hat hacker who is curious about LLMs. You’re an academic who researches RL for cybersecurity. You’ve participated in CTFs and you want to automate one.

Alignment: You’ve written detailed, concrete scenarios of significant risk from AI in a way that can be tested. You have built model evals and have ideas about how they can be better.

Do not rule yourself out if you do not fit one of those categories. It’s very likely the people we’re looking for do not fit any of the above.

This team is a technical research team within the Policy team with strong collaboration with the broader Anthropic Research team.

About Anthropic

Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our customers and for society as a whole. Our interdisciplinary team has experience across ML, physics, policy, business and product.

You might be a good fit if you:

  • Have solid ML-focused Python engineering and research skills, particularly around using and training models.
  • Have led and executed technical research with a team.
  • Are comfortable with messy experimental science. A lot of this is uncharted territory. We optimize for fast feedback loops. You may need to build your own tooling.
  • Can clearly articulate and discuss the findings and importance of your work.
  • Are mission-driven. You’re inspired to advance AI safety as fast as possible while deploying AI as positively as possible.

Responsibilities:

  • Leading technical research into frontier risks.
  • Developing and testing models with future capabilities.
  • Designing evals.
  • Collaborating with outside experts.
  • Briefing external stakeholders like labs and government.

Annual Salary (USD)

  • The expected salary range for this position is $250k - $450k.

Hybrid policy & US visa sponsorship: Currently, we expect all staff to be in our office at least 25% of the time. We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate; operations roles are especially difficult to support. But if we make you an offer, we will make every effort to get you into the United States, and we retain an immigration lawyer to help with this.

Role-specific policy: For this role, we prefer candidates who are able to be in our office more than 25% of the time, though we encourage you to apply even if you don’t think you will be able to do that.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

Company Info.

Anthropic

Anthropic, a public-benefit corporation and AI startup based in the United States, was established by former members of OpenAI. The company's primary focus is on creating general AI systems and language models, while maintaining a philosophy of responsible AI use.

  • Industry
    Artificial intelligence,Computer software
  • No. of Employees
    100
  • Location
    San Francisco, CA, USA
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Anthropic is currently hiring Research Scientist, Machine Learning Jobs in San Francisco, CA, USA with average base salary of $126,000 - $246,300 / Year.

Similar Jobs View More