Alignment is the problem of making AI systems pursue goals compatible with human values. Fundamentally, humans disagree about what we want, our values shift with context, and even when we agree on a goal, specifying it precisely is hard. For example, paperclip maximization.
Tell an AI to maximize paperclip production and it will convert the planet into paperclips, including the atoms in your body. Real alignment problems are more subtle. You want an AI to be helpful, but what does helpful mean? To whom? In what circumstances? If you optimize for maximum user engagement, you get systems that addict people to doomscrolling.
If you optimize for revenue, you get systems that exploit psychological vulnerabilities. If you optimize for harmlessness, you get systems that refuse to answer anything slightly controversial. Each optimization pushes against different human values. A confused chatbot is harmless. A superintelligent system pursuing misaligned goals could be catastrophic.
Interactive Visualizer
AI Alignment Challenge
Explore how AI systems can pursue goals that seem reasonable but lead to unintended consequences when not properly aligned with human values.
Choose AI Objective
World State - Step 1
Human Value Alignment
Key Insight: Even well-intentioned AI goals can lead to catastrophic outcomes when not properly aligned with the full spectrum of human values. The challenge is specifying what we actually want, not just what we think we want.