Musings on the Alignment Problem
Subscribe
Sign in
Home
Archive
About
Self-exfiltration is a key dangerous capability
We need to measure whether LLMs could “steal” themselves
Sep 13
•
Jan Leike
9
Share this post
Self-exfiltration is a key dangerous capability
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
14
New
Top
Community
A proposal for importing society’s values
Building towards Coherent Extrapolated Volition with language models
Mar 9
•
Jan Leike
20
Share this post
A proposal for importing society’s values
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
7
Distinguishing three alignment taxes
The impact of different alignment taxes depends on the context
Dec 19, 2022
•
Jan Leike
9
Share this post
Distinguishing three alignment taxes
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
5
Why I’m optimistic about our alignment approach
Some arguments in favor and responses to common objections
Dec 5, 2022
•
Jan Leike
37
Share this post
Why I’m optimistic about our alignment approach
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
20
What could a solution to the alignment problem look like?
A high-level view on the elusive once-and-for-all solution
Sep 27, 2022
•
Jan Leike
13
Share this post
What could a solution to the alignment problem look like?
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
7
What is inner alignment?
An explanation using the language of machine learning
May 8, 2022
•
Jan Leike
11
Share this post
What is inner alignment?
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
A minimal viable product for alignment
Bootstrapping a solution to the alignment problem
Mar 29, 2022
•
Jan Leike
12
Share this post
A minimal viable product for alignment
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
Why I’m excited about AI-assisted human feedback
How to scale alignment techniques to hard tasks
Mar 29, 2022
•
Jan Leike
17
Share this post
Why I’m excited about AI-assisted human feedback
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
See all
Musings on the Alignment Problem
Subscribe
Musings on the Alignment Problem
Subscribe
About
Archive
Recommendations
Sitemap
Share this publication
Musings on the Alignment Problem
aligned.substack.com
Copy link
Facebook
Email
Notes
Other
Musings on the Alignment Problem
By Jan Leike
Subscribe
No thanks
By registering you agree to Substack's
Terms of Service
, our
Privacy Policy
, and our
Information Collection Notice
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts