Musings on the Alignment Problem
Subscribe
Sign in
Home
Archive
About
New
Self-exfiltration is a key dangerous capability
We need to measure whether LLMs could “steal” themselves
Sep 13
•
Jan Leike
9
Share this post
Self-exfiltration is a key dangerous capability
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
14
March 2023
A proposal for importing society’s values
Building towards Coherent Extrapolated Volition with language models
Mar 9
•
Jan Leike
20
Share this post
A proposal for importing society’s values
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
7
December 2022
Distinguishing three alignment taxes
The impact of different alignment taxes depends on the context
Dec 19, 2022
•
Jan Leike
9
Share this post
Distinguishing three alignment taxes
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
5
Why I’m optimistic about our alignment approach
Some arguments in favor and responses to common objections
Dec 5, 2022
•
Jan Leike
37
Share this post
Why I’m optimistic about our alignment approach
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
20
September 2022
What could a solution to the alignment problem look like?
A high-level view on the elusive once-and-for-all solution
Sep 27, 2022
•
Jan Leike
13
Share this post
What could a solution to the alignment problem look like?
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
7
May 2022
What is inner alignment?
An explanation using the language of machine learning
May 8, 2022
•
Jan Leike
11
Share this post
What is inner alignment?
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
March 2022
A minimal viable product for alignment
Bootstrapping a solution to the alignment problem
Mar 29, 2022
•
Jan Leike
12
Share this post
A minimal viable product for alignment
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
Why I’m excited about AI-assisted human feedback
How to scale alignment techniques to hard tasks
Mar 29, 2022
•
Jan Leike
17
Share this post
Why I’m excited about AI-assisted human feedback
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
What is the alignment problem?
My attempt at clarifying a confusing topic
Mar 29, 2022
•
Jan Leike
16
Share this post
What is the alignment problem?
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
4
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts