Discussion about this post

User's avatar
Stefanya Poesy's avatar

It appears to me that your definition of alignment is slightly askew. In all your examples the real issue is who the system is aligned TO.

It appears to me the systems in question are performing in perfect alignment to a system OTHER than your observer, implying alignment, in the sense you seem to be implying, is relative.

The first question to ask then, IMO, is whose desires and intentions the system was really designed to serve.

Expand full comment
Miroslav's avatar

I do understand the intention and the main point of the post, but this part is contradictory:

“••• examples •••

In each of these cases, we can be pretty confident that the problem wasn’t a capability problem: clearly the humans or systems in question here are capable of what you wanted them to do, they just decided not to.”

What does it mean the systems are capeble of doing correctly and it just decides not to? How can we evaluate this and be sure it has nothing to do with the system’s capability? The narative can lead to some non-researchers to overestimate self-awareness capabilities of the system of equations for ex.

Expand full comment
3 more comments...

No posts