ออฟไลน์ด้วยแอป Player FM !
A taxonomy of non-schemer models (Section 1.2 of "Scheming AIs")
Manage episode 385189426 series 3402048
This is section 1.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
บท
1. A taxonomy of non-schemer models (Section 1.2 of "Scheming AIs") (00:00:00)
2. 1.2 Other models training might produce (00:00:36)
3. 1.2.1 Terminal training-gamers (or, “reward-on-the-episode seekers”) (00:01:15)
4. 1.2.2 Models that aren’t playing the training game (00:04:12)
5. 1.2.2.1 Training saints (00:04:50)
6. 1.2.2.2 Misgeneralized non-training-gamers (00:06:17)
7. 1.2.3 Contra “internal” vs. “corrigible” alignment (00:09:22)
8. 1.2.4 The overall taxonomy (00:10:15)
63 ตอน
Manage episode 385189426 series 3402048
This is section 1.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
บท
1. A taxonomy of non-schemer models (Section 1.2 of "Scheming AIs") (00:00:00)
2. 1.2 Other models training might produce (00:00:36)
3. 1.2.1 Terminal training-gamers (or, “reward-on-the-episode seekers”) (00:01:15)
4. 1.2.2 Models that aren’t playing the training game (00:04:12)
5. 1.2.2.1 Training saints (00:04:50)
6. 1.2.2.2 Misgeneralized non-training-gamers (00:06:17)
7. 1.2.3 Contra “internal” vs. “corrigible” alignment (00:09:22)
8. 1.2.4 The overall taxonomy (00:10:15)
63 ตอน
ทุกตอน
×ขอต้อนรับสู่ Player FM!
Player FM กำลังหาเว็บ