Musings on the Alignment Problem
Subscribe
Sign in
Home
Archive
About
Combining weak-to-strong generalization with scalable oversight
A high-level view on how this new approach fits into our alignment plans
Dec 20, 2023
•
Jan Leike
25
Share this post
Combining weak-to-strong generalization with scalable oversight
aligned.substack.com
Copy link
Facebook
Email
Note
Other
13
September 2023
Self-exfiltration is a key dangerous capability
We need to measure whether LLMs could “steal” themselves
Sep 13, 2023
•
Jan Leike
21
Share this post
Self-exfiltration is a key dangerous capability
aligned.substack.com
Copy link
Facebook
Email
Note
Other
18
March 2023
A proposal for importing society’s values
Building towards Coherent Extrapolated Volition with language models
Mar 9, 2023
•
Jan Leike
27
Share this post
A proposal for importing society’s values
aligned.substack.com
Copy link
Facebook
Email
Note
Other
10
December 2022
Distinguishing three alignment taxes
The impact of different alignment taxes depends on the context
Dec 19, 2022
•
Jan Leike
10
Share this post
Distinguishing three alignment taxes
aligned.substack.com
Copy link
Facebook
Email
Note
Other
5
Why I’m optimistic about our alignment approach
Some arguments in favor and responses to common objections
Dec 5, 2022
•
Jan Leike
44
Share this post
Why I’m optimistic about our alignment approach
aligned.substack.com
Copy link
Facebook
Email
Note
Other
25
September 2022
What could a solution to the alignment problem look like?
A high-level view on the elusive once-and-for-all solution
Sep 27, 2022
•
Jan Leike
14
Share this post
What could a solution to the alignment problem look like?
aligned.substack.com
Copy link
Facebook
Email
Note
Other
7
May 2022
What is inner alignment?
An explanation using the language of machine learning
May 8, 2022
•
Jan Leike
13
Share this post
What is inner alignment?
aligned.substack.com
Copy link
Facebook
Email
Note
Other
March 2022
A minimal viable product for alignment
Bootstrapping a solution to the alignment problem
Mar 29, 2022
•
Jan Leike
20
Share this post
A minimal viable product for alignment
aligned.substack.com
Copy link
Facebook
Email
Note
Other
Why I’m excited about AI-assisted human feedback
How to scale alignment techniques to hard tasks
Mar 29, 2022
•
Jan Leike
24
Share this post
Why I’m excited about AI-assisted human feedback
aligned.substack.com
Copy link
Facebook
Email
Note
Other
What is the alignment problem?
My attempt at clarifying a confusing topic
Mar 29, 2022
•
Jan Leike
29
Share this post
What is the alignment problem?
aligned.substack.com
Copy link
Facebook
Email
Note
Other
5
Share
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts