5 Comments
Feb 14, 2023Liked by Jan Leike

I didn't understand this part; can you elaborate or try to explain differently?

"We need to do a fixed amount of alignment work for each new generation of AI systems, for example when going from GPT-2 to GPT-3. In this case the alignment tax that we can sustain depends on how much work needs to be done. For example, if the “pre-tax” compute cost of the automated alignment work is 1% of the development cost of the new AI system, then a 1,000% tax only brings the total alignment cost to 11% of the overall cost of the AI system. However, this only works if the (object level) performance tax on the next generation isn’t much higher than the performance tax on the current generation, otherwise performance taxes will end up compounding from generation to generation."

Expand full comment

I like this taxonomy! And I like your point about performance tax not mattering so much for automated alignment research. I don't have anything substantive to add, but I'll contribute by linking to these other posts that also attempt to taxonomize alignment taxes / kinds of competiveness. https://www.alignmentforum.org/posts/sD6KuprcS3PFym2eM/three-kinds-of-competitiveness https://www.alignmentforum.org/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai

Expand full comment