Alignment

Alignment is the problem of making AI systems pursue goals compatible with human values. Fundamentally, humans disagree about what we want, our values shift with context, and even when we agree on a goal, specifying it precisely is hard. For example, paperclip maximization.

Tell an AI to maximize paperclip production and it will convert the planet into paperclips, including the atoms in your body. Real alignment problems are more subtle. You want an AI to be helpful, but what does helpful mean? To whom? In what circumstances? If you optimize for maximum user engagement, you get systems that addict people to doomscrolling.

If you optimize for revenue, you get systems that exploit psychological vulnerabilities. If you optimize for harmlessness, you get systems that refuse to answer anything slightly controversial. Each optimization pushes against different human values. A confused chatbot is harmless. A superintelligent system pursuing misaligned goals could be catastrophic.

Interactive Visualizer

AI Alignment Challenge

Explore how AI systems can pursue goals that seem reasonable but lead to unintended consequences when not properly aligned with human values.

Choose AI Objective

World State - Step 1

🏭🌳🏠

📎

100

Paperclips

😊

Human Welfare

Wellbeing

🌍

Environment

Ecosystem

Human Value Alignment

safety

100%

happiness

100%

environment

100%

efficiency

100%

Alignment Score: Well Aligned

Key Insight: Even well-intentioned AI goals can lead to catastrophic outcomes when not properly aligned with the full spectrum of human values. The challenge is specifying what we actually want, not just what we think we want.

Interactive Visualizer

AI Alignment Challenge

Choose AI Objective

World State - Step 1

Human Value Alignment

Related Terms

Related Essays