11 Comments

"We want to train our models to always tell us the truth, even when we can’t check. The more ways we have to catch our models lying, the harder it is for them to lie."

Just something that I'm curious after reading this post and the paper that was published: part of the goal of alignment is making sure models are truthful (i.e. don't hallucinate). Another is to have it adhere to/ascribe to certain values, like equality, justice etc. Does this fall under the same category for you, and is W2SG also effective in that sense?

Expand full comment

Appreciate the work you're doing, Jan, and others also who are focused on AI safety/alignment.

However, I again urge you and your colleagues to engage in the task of a big picture assessment of the possible solution spaces available here. Is there a possible solution to superalignment that even begins to approach the certainty we'll need to actually have safe AGI/ASI?

How could we know? Do we know enough now to know the answer? As you know from our previous discussions I am in the camp of "we know enough already to know that there is no solution to superalignment since it's logically impossible for a vastly more-than-human intelligent entity to be controlled in any significant way by humans."

As such, I am now of the view that efforts on AI safety, conducted without a concurrent global pause in frontier model development, are simply enabling irresponsible AI development.

The recent turmoil at OpenAI is a pertinent example of the dangers of human messiness when dealing with massively dangerous tools.

An essay of mine is coming out on these issues in Scientific American shortly. I'll post it here when it comes out. I'd appreciate any further responses you have to my thoughts here.

Expand full comment

hi Jan, I would appreciate your response to my comments. These are, as you of would agree, massively important issues that you have raised and that I have addressed in my comments below.

Expand full comment

I'd appreciate your response to my comments and SciAm column posted below, Jan, thanks.

Expand full comment

As promised, here's my latest essay at Scientific American on my view that AI safety research is at this point simply enabling irresponsible AI development: https://www.scientificamerican.com/article/ai-safety-research-only-enables-the-dangers-of-runaway-superintelligence/?fbclid=IwAR0wWOgveSakikNlVvjjTySm055ptchAUjC4a8gz1LJTVf2Cla8yQdknnkY

Expand full comment

How do you define human values and goals in the context of the non-dual nature of the universe, and how do you discern whether they are beneficial or malevolent without resorting to dualistic perspectives?

Expand full comment