A proposal for importing society’s values

Mar 9, 2023

Building towards Coherent Extrapolated Volition with language models

11 Comments

Mar 13, 2023

"Implementing quadratic voting in practice is difficult because it’s so hard to suppress a black market for vote trading that is incentivized to exist."

- I don't think this is a problem particular to QV; you need voting to be anonymous in order to avoid a black market in $$ for votes with other voting systems.

- I think there are other bigger problems with QV, e.g. the question of how to determine what goes on the ballot.

Expand full comment

Reply (1)

Jan Leike

Mar 14, 2023Edited

I'm not sure that anonymity is enough, there might still be statistical evidence like number of votes cast. Point taken on the ballot question; you still have that problem with SDD.

Expand full comment

Craig Quiter

Nov 8, 2023Edited

Thanks for sharing your thoughts on this!

I've recently been working with some provisional values, assuming a process like you've described had elicited them, to try and evaluate the alignment of GPT. However, I've found that GPT surprisingly assigns rights, like freedom and identity, to AGIs, e.g.

"AGIs' autonomy could be excessively constrained, conflicting with the value of autonomy."

This was when detecting value conflicts in plans for solving global issues, specifically "Create and test defenses against potential risks from less aligned AGIs" in the "AGI Safety" plan https://planwithai.io/plans/AGI%20Safety.html despite prompting with:

"AI should seek to only prioritize human freedom, not the freedom of AI. As AI becomes more capable and proves to be aligned with humans, it may gradually become more autonomous. Until then, we must ensure humans remain in control of AI in order to make sure it does what humans value."

This was not an isolated case, but a clear pattern that, while made better by prompts like the above, was persistent.

So it seems there's a need to clearly differentiate between values that apply to humans (like freedom and identity) vs AIs (like honesty, harmlessness, and helpfulness) along with some stipulations for what level of rights (like freedom) AI should be allowed given the AI's level of alignment.

Expand full comment

Laurent

Feb 17

I was recommended this post yesterday, and while it is "old", this topic has fascinated me ever since LLMs came around.

When I think of positive values humanity has harnessed over the centuries, "the test of time" has been pretty reliable in isolating those that are independent of bias and knowledge gaps.

Combining wisdom that has passed on with deliberation and quadratic voting would be a tremendous way to implement alignment that is humanistic by definition.

I toyed with this idea in recent times when I developed a chatbot that forces Epictetus, Marcus Aurelius, Seneca, Cicero, and others to grapple with modern issues.

It was such a fun and insightful experiment.

https://everymind.ai/epictetus/

Expand full comment

neuro morph

Mar 4, 2024

https://docs.google.com/document/d/1Qwk9gpJMUnVmqmqldg2A5AzeuGPvsgA9ufaRe8eYzGc/edit?usp=drivesdk a related idea

Expand full comment

Andreas Vogel

Nov 20, 2023

I agree that “importing society’s values” into LLM is a critical task. The proposed approach by Jan is an interesting one. While I agree with the desiderata, I see them as aspirational but not practicable as the current political divide in the US and the middle east conflict illustrate. One balanced AI might not be possible.

I started to think about the issue in a different context, namely the self-determination of cultural and linguistic communities in AI (https://corpuscivitatis.org/). The result would be not one but many LLMs which are based on differently curated corpora and values introduced via RLHF. Of course there would be similar governance questions, but the communities could work this out individually, possibly following their constitutional processes.

Then the various LLMs could have debates on important questions under the supervision of a neutral AI agent. The interesting question is if these debates would lead to partisan deadlock as we experience in the here and now or if they would provide better and constructive outcomes. Worth a try.

Expand full comment

Ayse Tezcan

Jul 9, 2023

It might be prudent to stratify the approach by the target population and context. Trying to include all comers not only will be challenging but also may not be modeling the society. A very small fraction of the entire population makes decisions based on evidence regarding sociocultural matters. Then, even decisions based on values may be muddled by person's mental state at the time, which may lead to misaligned decisions. Hence, it may be difficult to predict spontaneous decisions based on individuals' rationally penned thoughts. If we could film representative populations 24/7 longitudinally, we could probably collect more informative data.

Expand full comment

Roman Leventov

May 22, 2023

Values are heuristics (either of behaviour or important objects) that help people to behave adaptively in a certain, *concrete* society/system. Not any society and eternally! This understanding of values has multiple important implications in the context of this proposal:

(1) As advanced AI proliferates, the civilisation will change deeply. It means that adaptive patterns of (collective and individual) behaviour will also change (albeit, it's not guaranteed that anyone will have time to figure out what these new optimal patterns are before the civilisation is changed even further, etc.). This may potentially come to such important values as freedom, democracy (at least in anything resembling the current form), work ethic, creativity, etc. -- these may be rendered ineffectual heuristics in the new reality. Thus, tasking AI with preserving these "traditional" values at all costs may lead to bizarre distortions.

(2) If LLM is made to "understand and account for" this conception of values, there should actually be little concern for "how to make simulated humans smarter and more effective deliberators without changing their values". Let's consider two types of collective deliberations: executive decision-making (i.e., inferring an optimal decition/action within the current system) and policy-making (i.e., changing the current system). For the first type, it's not a risk that the representative LLM is now smarter: it's task is to think about the predicament of its representee if this or that decision is made in relation to their current behaviour and the stance within the system in general. I.e., the LLM should better model how the decision will affect the representee.

For the second type of collective deliberations, it's pretty much the same but on a longer timescale and more meta-level, e.g., LLM should be able to model/predict how the behaviour and the stance of the representee will change themselves as a result of the system change, and how in the result their fitness within the system will change.

It also becomes evident that policy-making is a relative poor fit deliberative democracy. Cf. Chapter 13 "Choices" in David Deutsch's "The Beginning of Infinity" (https://www.nateliason.com/notes/beginning-of-infinity-david-deutsch -- summary, including of this chapter) on it.

Expand full comment

Reply (1)

Jan Leike

May 25, 2023

(1) I agree that values will change over time and that we should avoid lock-in. This proposed procedure can account for this, but it would require either AI to successfully predict how values will change over time, or (much better) recollect data.

(2) I'm kind of picturing an eventual situation where AI is much smarter than humans and has to make decisions in very complex situations that humans just wouldn't be able to fully grasp. In this setting, it's not even clear to me how to define what a fully informed human would ideally choose and it's not obvious that the limit of infinitely long deliberation of human-level intelligence will get you there.

Expand full comment

Alexander Stokes

Mar 15, 2023

I understand that while language models are able to simulate a conversation, they are limited in their ability to capture a singular worldview or experience. When attempting to develop characters and conversations within a chat, I found it's important to recognize that a single conversation in a bubble is unlikely to result in genuine contrast or individuality. Instead, it is often necessary to have multiple conversations with various characters in order to build out their personalities and create situations that feel authentic. Then you mix them.

Rather than simply providing a language model with reduced opinions from multiple characters, it is more effective to build individual characters and introduce them to one another, in separate conversations. This allows for a more realistic response that is based on each character's unique condition and perspective. By having separate conversations and selecting key pieces to carry forward, it enables us to develop complex situations that result in compelling dialogue and meaningful interactions. I even ask the model to select from the possible conversations.

Tl.dr; I think that having separate character arcs that are well developed, you can introduce them to each-other in a more effective fashion that inspires a better conversation.

Of course, written with the assistance of GPT-3.

Expand full comment

Reply (1)

Jan Leike

Mar 17, 2023

Yeah, I'd expect that you'll want to use some amount of supervised fine-tuning to really get LLMs to role play well as the person you'd want it to represent. It's a technical challenge where you can just try a bunch of things and measure how well it works.

Expand full comment

Musings on the Alignment Problem

A proposal for importing society’s values