Combining weak-to-strong generalization with scalable oversight
Self-exfiltration is a key dangerous capability
A proposal for importing society’s values