It appears to me that your definition of alignment is slightly askew. In all your examples the real issue is who the system is aligned TO.
It appears to me the systems in question are performing in perfect alignment to a system OTHER than your observer, implying alignment, in the sense you seem to be implying, is relative.
The first question to ask then, IMO, is whose desires and intentions the system was really designed to serve.
Yeah, great point! For any system you could always try to figure out its objective (e.g. using inverse reinforcement learning) and then proclaim the system is aligned with that objective. The problem I'm interested in is how we can design a system such that it's aligned with a provided objective. The question what that provided objective should be is still really important, but I think it should be treated independently.
I think as long as there are conflicting interests and power differentials, there's little hope for systems to be unequivocally designed for the "users" rather than to benefit the providers' more or less self-serving agendas.
I do understand the intention and the main point of the post, but this part is contradictory:
“••• examples •••
In each of these cases, we can be pretty confident that the problem wasn’t a capability problem: clearly the humans or systems in question here are capable of what you wanted them to do, they just decided not to.”
What does it mean the systems are capeble of doing correctly and it just decides not to? How can we evaluate this and be sure it has nothing to do with the system’s capability? The narative can lead to some non-researchers to overestimate self-awareness capabilities of the system of equations for ex.
great points! we humans don't always align on many soft issues and today's guidelines may change with emerging evidence; how can we expect better from AI performance? can they successfully and unbiasedly navigate mess of human cognitive output that litter AI model's data feed? how do we lay the theory and causal inference as foundational principles of generative AI models? so many challenges to tackle, but exciting times for those who enjoy playing with hard questions.
It appears to me that your definition of alignment is slightly askew. In all your examples the real issue is who the system is aligned TO.
It appears to me the systems in question are performing in perfect alignment to a system OTHER than your observer, implying alignment, in the sense you seem to be implying, is relative.
The first question to ask then, IMO, is whose desires and intentions the system was really designed to serve.
Yeah, great point! For any system you could always try to figure out its objective (e.g. using inverse reinforcement learning) and then proclaim the system is aligned with that objective. The problem I'm interested in is how we can design a system such that it's aligned with a provided objective. The question what that provided objective should be is still really important, but I think it should be treated independently.
I think as long as there are conflicting interests and power differentials, there's little hope for systems to be unequivocally designed for the "users" rather than to benefit the providers' more or less self-serving agendas.
I do understand the intention and the main point of the post, but this part is contradictory:
“••• examples •••
In each of these cases, we can be pretty confident that the problem wasn’t a capability problem: clearly the humans or systems in question here are capable of what you wanted them to do, they just decided not to.”
What does it mean the systems are capeble of doing correctly and it just decides not to? How can we evaluate this and be sure it has nothing to do with the system’s capability? The narative can lead to some non-researchers to overestimate self-awareness capabilities of the system of equations for ex.
great points! we humans don't always align on many soft issues and today's guidelines may change with emerging evidence; how can we expect better from AI performance? can they successfully and unbiasedly navigate mess of human cognitive output that litter AI model's data feed? how do we lay the theory and causal inference as foundational principles of generative AI models? so many challenges to tackle, but exciting times for those who enjoy playing with hard questions.