ออฟไลน์ด้วยแอป Player FM !
Simplicity arguments for scheming (Section 4.3 of "Scheming AIs")
Manage episode 388134113 series 3402048
This is section 4.3 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
บท
1. Simplicity arguments for scheming (Section 4.3 of "Scheming AIs") (00:00:00)
2. 4.3 Simplicity arguments (00:00:28)
3. 4.3.1 What is “simplicity”? (00:00:40)
4. 4.3.2 Does SGD select for simplicity? (00:04:24)
5. 4.3.3 The simplicity advantages of schemer-like goals (00:05:53)
6. 4.3.4 How big are these simplicity advantages? (00:07:57)
7. 4.3.5 Does this sort of simplicity-focused argument make plausible predictions about the sort (00:16:25)
8. 4.3.6 Overall assessment of simplicity arguments (00:18:53)
63 ตอน
Manage episode 388134113 series 3402048
This is section 4.3 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
บท
1. Simplicity arguments for scheming (Section 4.3 of "Scheming AIs") (00:00:00)
2. 4.3 Simplicity arguments (00:00:28)
3. 4.3.1 What is “simplicity”? (00:00:40)
4. 4.3.2 Does SGD select for simplicity? (00:04:24)
5. 4.3.3 The simplicity advantages of schemer-like goals (00:05:53)
6. 4.3.4 How big are these simplicity advantages? (00:07:57)
7. 4.3.5 Does this sort of simplicity-focused argument make plausible predictions about the sort (00:16:25)
8. 4.3.6 Overall assessment of simplicity arguments (00:18:53)
63 ตอน
ทุกตอน
×ขอต้อนรับสู่ Player FM!
Player FM กำลังหาเว็บ