Musings on the Alignment Problem
Subscribe
Sign in
Home
Archive
About
Latest
Top
Discussions
Should we control AI instead of aligning it?
(Spoiler: no)
Jan 24
•
Jan Leike
20
Share this post
Musings on the Alignment Problem
Should we control AI instead of aligning it?
Copy link
Facebook
Email
Notes
More
7
November 2024
Crisp and fuzzy tasks
Why fuzzy tasks matter and how to align models on them
Nov 22, 2024
•
Jan Leike
41
Share this post
Musings on the Alignment Problem
Crisp and fuzzy tasks
Copy link
Facebook
Email
Notes
More
6
Two alignment threat models
Why under-elicitation and scheming are both important to address
Nov 8, 2024
•
Jan Leike
30
Share this post
Musings on the Alignment Problem
Two alignment threat models
Copy link
Facebook
Email
Notes
More
21
December 2023
Combining weak-to-strong generalization with scalable oversight
A high-level view on how this new approach fits into our alignment plans
Dec 20, 2023
•
Jan Leike
26
Share this post
Musings on the Alignment Problem
Combining weak-to-strong generalization with scalable oversight
Copy link
Facebook
Email
Notes
More
13
September 2023
Self-exfiltration is a key dangerous capability
We need to measure whether LLMs could “steal” themselves
Sep 13, 2023
•
Jan Leike
26
Share this post
Musings on the Alignment Problem
Self-exfiltration is a key dangerous capability
Copy link
Facebook
Email
Notes
More
18
March 2023
A proposal for importing society’s values
Building towards Coherent Extrapolated Volition with language models
Mar 9, 2023
•
Jan Leike
29
Share this post
Musings on the Alignment Problem
A proposal for importing society’s values
Copy link
Facebook
Email
Notes
More
10
December 2022
Distinguishing three alignment taxes
The impact of different alignment taxes depends on the context
Dec 19, 2022
•
Jan Leike
12
Share this post
Musings on the Alignment Problem
Distinguishing three alignment taxes
Copy link
Facebook
Email
Notes
More
5
Why I’m optimistic about our alignment approach
Some arguments in favor and responses to common objections
Dec 5, 2022
•
Jan Leike
52
Share this post
Musings on the Alignment Problem
Why I’m optimistic about our alignment approach
Copy link
Facebook
Email
Notes
More
25
September 2022
What could a solution to the alignment problem look like?
A high-level view on the elusive once-and-for-all solution
Sep 27, 2022
•
Jan Leike
14
Share this post
Musings on the Alignment Problem
What could a solution to the alignment problem look like?
Copy link
Facebook
Email
Notes
More
8
May 2022
What is inner alignment?
An explanation using the language of machine learning
May 8, 2022
•
Jan Leike
14
Share this post
Musings on the Alignment Problem
What is inner alignment?
Copy link
Facebook
Email
Notes
More
March 2022
A minimal viable product for alignment
Bootstrapping a solution to the alignment problem
Mar 29, 2022
•
Jan Leike
22
Share this post
Musings on the Alignment Problem
A minimal viable product for alignment
Copy link
Facebook
Email
Notes
More
Why I’m excited about AI-assisted human feedback
How to scale alignment techniques to hard tasks
Mar 29, 2022
•
Jan Leike
26
Share this post
Musings on the Alignment Problem
Why I’m excited about AI-assisted human feedback
Copy link
Facebook
Email
Notes
More
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts