"We want to train our models to always tell us the truth, even when we can’t check. The more ways we have to catch our models lying, the harder it is for them to lie."
Just something that I'm curious after reading this post and the paper that was published: part of the goal of alignment is making sure models are truthful (i.e. don't hallucinate). Another is to have it adhere to/ascribe to certain values, like equality, justice etc. Does this fall under the same category for you, and is W2SG also effective in that sense?
Appreciate the work you're doing, Jan, and others also who are focused on AI safety/alignment.
However, I again urge you and your colleagues to engage in the task of a big picture assessment of the possible solution spaces available here. Is there a possible solution to superalignment that even begins to approach the certainty we'll need to actually have safe AGI/ASI?
How could we know? Do we know enough now to know the answer? As you know from our previous discussions I am in the camp of "we know enough already to know that there is no solution to superalignment since it's logically impossible for a vastly more-than-human intelligent entity to be controlled in any significant way by humans."
As such, I am now of the view that efforts on AI safety, conducted without a concurrent global pause in frontier model development, are simply enabling irresponsible AI development.
The recent turmoil at OpenAI is a pertinent example of the dangers of human messiness when dealing with massively dangerous tools.
An essay of mine is coming out on these issues in Scientific American shortly. I'll post it here when it comes out. I'd appreciate any further responses you have to my thoughts here.
hi Jan, I would appreciate your response to my comments. These are, as you of would agree, massively important issues that you have raised and that I have addressed in my comments below.
How do you define human values and goals in the context of the non-dual nature of the universe, and how do you discern whether they are beneficial or malevolent without resorting to dualistic perspectives?
"We want to train our models to always tell us the truth, even when we can’t check. The more ways we have to catch our models lying, the harder it is for them to lie."
Just something that I'm curious after reading this post and the paper that was published: part of the goal of alignment is making sure models are truthful (i.e. don't hallucinate). Another is to have it adhere to/ascribe to certain values, like equality, justice etc. Does this fall under the same category for you, and is W2SG also effective in that sense?
Appreciate the work you're doing, Jan, and others also who are focused on AI safety/alignment.
However, I again urge you and your colleagues to engage in the task of a big picture assessment of the possible solution spaces available here. Is there a possible solution to superalignment that even begins to approach the certainty we'll need to actually have safe AGI/ASI?
How could we know? Do we know enough now to know the answer? As you know from our previous discussions I am in the camp of "we know enough already to know that there is no solution to superalignment since it's logically impossible for a vastly more-than-human intelligent entity to be controlled in any significant way by humans."
As such, I am now of the view that efforts on AI safety, conducted without a concurrent global pause in frontier model development, are simply enabling irresponsible AI development.
The recent turmoil at OpenAI is a pertinent example of the dangers of human messiness when dealing with massively dangerous tools.
An essay of mine is coming out on these issues in Scientific American shortly. I'll post it here when it comes out. I'd appreciate any further responses you have to my thoughts here.
Yampolskiy's new book on uncontrollability came out recently: he and I have both attempted to engage with you and we remain open to further dialogue. Would be happy to schedule a Zoom call if you're interested. https://www.amazon.com/Unexplainable-Unpredictable-Uncontrollable-Artificial-Intelligence/dp/103257626X/ref=sr_1_1?crid=3MD3FUP9BIJ08&dib=eyJ2IjoiMSJ9.EUfId556iP3w3ngElTbcuITPA-Tj8tf5adlR1gTpk_ZCREWJyZMU-0W-fIa3KG4bEwZoLX_6KD_h9N4aeH3Nu92EEqdZXNgw2ivVMlK_jwrq-Lq0s9yS1Q4e-SG2a8-gEQPQv1LaOccDgOC_bG_njBTgv6TgxjBl2M0Sa6UsN_PczTAfEPZQBk10sGBA8Szy.NNGShTn1sUwE9qpAX7vCr2Mgf7vrS0DMpBsjdhg3kWU&dib_tag=se&keywords=roman+v.+yampolskiy&qid=1709420354&sprefix=yampolskiy%2Caps%2C789&sr=8-1
hi Jan, I would appreciate your response to my comments. These are, as you of would agree, massively important issues that you have raised and that I have addressed in my comments below.
I'd appreciate your response to my comments and SciAm column posted below, Jan, thanks.
As promised, here's my latest essay at Scientific American on my view that AI safety research is at this point simply enabling irresponsible AI development: https://www.scientificamerican.com/article/ai-safety-research-only-enables-the-dangers-of-runaway-superintelligence/?fbclid=IwAR0wWOgveSakikNlVvjjTySm055ptchAUjC4a8gz1LJTVf2Cla8yQdknnkY
How do you define human values and goals in the context of the non-dual nature of the universe, and how do you discern whether they are beneficial or malevolent without resorting to dualistic perspectives?