Research

We publish our research to advance the field and enable external scrutiny of our work. All papers are available on arXiv.

Constitutional AI: Training AI to be Helpful and Harmless

S. Chen, J. Wright, P. Patel et al.

We present Constitutional AI, a method for training AI assistants to be helpful, harmless, and honest using a set of principles to guide behavior.

E. Kowalski, M. Thompson et al.

We demonstrate techniques for identifying interpretable features in language models, enabling better understanding of model behavior.

S. Chen, J. Wright et al.

We introduce our Responsible Scaling Policy, a framework for evaluating and mitigating risks as AI systems become more capable.

P. Patel, E. Kowalski et al.

We study the persistence of problematic behaviors through safety training, with implications for AI safety evaluation.

M. Thompson, D. Okonkwo et al.

We identify a class of attacks that become more effective as context windows grow, and propose mitigations.

J. Wright, S. Chen et al.

We present advances in multimodal reasoning that combine visual and textual understanding for complex tasks.

Subscribe to receive notifications about new research publications.