Discussion about this post

User's avatar
Ash Jafari's avatar

Thanks Jan so much for the work you and your team are undertaking. Hopefully in a decade or two, AI alignment researchers like yourselves are going to be considered heroes like the astronauts were in the space race. Three questions for you:

1. What do you make of the following paper and the general argument that in the end, we cannot control/align an intelligence that is superior to humans: (https://journals.riverpublishers.com/index.php/JCSANDM/article/view/16219)?

2. There is a lot of interest by billionaire funders and the effective altruist movement to dramatically increase the funding and resourcing for AI safety/alignment. I've gathered that funding is no longer the rate limiter but AI alignment researchers are the bottleneck. Is that your view? What can be done to re-skill or re-orient PhDs and academics?

3. Related to #2, how much would we have to scale up the AI alignment research personnel so that you feel you can meet and handle the progress towards AGI? For example, would a 2x, 5x, or 10x scale up make you feel AI alignment is no longer the bottleneck?

Thank you!

Anurag's avatar

"We develop a formal theory for alignment that captures what it means for a system to be aligned with a principal (the human user)."

Three years later, I found myself circling the same question and similarly unable to find a fully satisfying answer. I eventually settled on the following framing for my own peace of mind...

Modern AI systems occupy a three-dimensional space defined by varying degrees of Beingness, Cognition, and Intelligence. Each of these dimnsions can be characterized by evaluating systems on specific properties (each property might require its own evaluation ingenuity). And different types of alignment risks emerge at different points in the cubical space.

I have explained the framework in a series of articles here "A Structural Theory of AI Alignment": https://www.lesswrong.com/s/agaGwGjvoW6qRQLsL

It culminates in an interactive 3D-view generated for the purpose : https://ai-alignment-risks-conceptual-space-644614615348.us-west1.run.app/

In other related posts, I also explore tentative evaluation approaches for some of these properties, and possible ways systems might be improved along them.

9 more comments...

No posts

Ready for more?