Discussion about this post

User's avatar
Jurgen Gravestein's avatar

"We want to train our models to always tell us the truth, even when we can’t check. The more ways we have to catch our models lying, the harder it is for them to lie."

Just something that I'm curious after reading this post and the paper that was published: part of the goal of alignment is making sure models are truthful (i.e. don't hallucinate). Another is to have it adhere to/ascribe to certain values, like equality, justice etc. Does this fall under the same category for you, and is W2SG also effective in that sense?

Expand full comment
Michael Spencer's avatar

Okay you can start up again now JL.

Expand full comment
4 more comments...

No posts