ออฟไลน์ด้วยแอป Player FM !
Speed arguments against scheming (Section 4.4-4.7 of "Scheming AIs")
Manage episode 388370178 series 3402048
This is section 4.4 through 4.7 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
บท
1. Speed arguments against scheming (Section 4.4-4.7 of "Scheming AIs") (00:00:00)
2. 4.4 Speed arguments (00:00:29)
3. 4.4.1 How big are the absolute costs of this extra reasoning? (00:02:22)
4. 4.4.2 How big are the costs of this extra reasoning relative to the simplicity benefits of (00:07:06)
5. 4.4.3 Can we actively shape training to bias towards speed over simplicity? (00:09:21)
6. 4.5 The “not-your-passion” argument (00:10:27)
7. 4.6 The relevance of “slack” to these arguments (00:12:46)
8. 4.7 Takeaways re: arguments that focus on the final properties of the model (00:13:38)
63 ตอน
Manage episode 388370178 series 3402048
This is section 4.4 through 4.7 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
บท
1. Speed arguments against scheming (Section 4.4-4.7 of "Scheming AIs") (00:00:00)
2. 4.4 Speed arguments (00:00:29)
3. 4.4.1 How big are the absolute costs of this extra reasoning? (00:02:22)
4. 4.4.2 How big are the costs of this extra reasoning relative to the simplicity benefits of (00:07:06)
5. 4.4.3 Can we actively shape training to bias towards speed over simplicity? (00:09:21)
6. 4.5 The “not-your-passion” argument (00:10:27)
7. 4.6 The relevance of “slack” to these arguments (00:12:46)
8. 4.7 Takeaways re: arguments that focus on the final properties of the model (00:13:38)
63 ตอน
ทุกตอน
×ขอต้อนรับสู่ Player FM!
Player FM กำลังหาเว็บ