Intro to Brain-Like-AGI Safety

AI Safety Fundamentals: Alignment

เนื้อหาจัดทำโดย BlueDot Impact เนื้อหาพอดแคสต์ทั้งหมด รวมถึงตอน กราฟิก และคำอธิบายพอดแคสต์ได้รับการอัปโหลดและจัดหาให้โดยตรงจาก BlueDot Impact หรือพันธมิตรแพลตฟอร์มพอดแคสต์ของพวกเขา หากคุณเชื่อว่ามีบุคคลอื่นใช้งานที่มีลิขสิทธิ์ของคุณโดยไม่ได้รับอนุญาต คุณสามารถปฏิบัติตามขั้นตอนที่แสดงไว้ที่นี่ https://th.player.fm/legal

5M ago 1:02:10

MP3•หน้าโฮมของตอน

(Sections 3.1-3.4, 6.1-6.2, and 7.1-7.5)

Suppose we someday build an Artificial General Intelligence algorithm using similar principles of learning and cognition as the human brain. How would we use such an algorithm safely?

I will argue that this is an open technical problem, and my goal in this post series is to bring readers with no prior knowledge all the way up to the front-line of unsolved problems as I see them.

If this whole thing seems weird or stupid, you should start right in on Post #1, which contains definitions, background, and motivation. Then Posts #2–#7 are mainly neuroscience, and Posts #8–#15 are more directly about AGI safety, ending with a list of open questions and advice for getting involved in the field.

Source:

https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

บท

1. Intro to Brain-Like-AGI Safety (00:00:00)

2. 3. Two subsystems: Learning & Steering (00:00:14)

3. 3.1 Post summary / Table of contents (00:00:19)

4. 3.2 Big picture (00:04:07)

5. 3.2.1 Each subsystem generally needs its own sensory processor (00:09:35)

6. 3.3 “Triune Brain Theory” is wrong, but let’s not throw out the baby with the bathwater (00:12:27)

7. 3.4 Three types of ingredients in a Steering Subsystem (00:16:35)

8. 3.4.1 Summary table (00:16:44)

9. 3.4.2 Aside: what do I mean by “drives”? (00:18:37)

10. 3.4.3 Category A: Things the Steering Subsystem needs to do in order to get general intelligence (e.g. curiosity drive) (00:20:46)

11. 3.4.4 Category B: Everything else in the human Steering Subsystem (e.g. altruism-related drives) (00:24:15)

12. 3.4.5 Category C: Every other possibility (e.g. drive to increase my bank account balance) (00:28:26)

13. 6. Big picture of motivation, decision-making, and RL (00:31:19)

14. 6.1 Post summary / Table of contents (00:31:30)

15. 6.2 Big picture (00:35:54)

16. 6.2.1 Relation to “two subsystems” (00:37:43)

17. 6.2.2 Quick run-through (00:38:41)

18. 7. From hardcoded drives to foresighted plans: A worked example (00:42:30)

19. 7.1 Post summary / Table of contents (00:42:43)

20. 7.2 Reminder from the previous post: big picture of motivation and decision-making (00:45:24)

21. 7.3 Building a probabilistic generative world-model in the cortex (00:46:21)

22. 7.4 Credit assignment when I first bite into the cake (00:48:40)

23. 7.5 Planning towards goals via reward-shaping (00:53:53)

24. 7.5.1 The other Thought Assessors. Or: The heroic feat of ordering a cake for next week, when you’re feeling nauseous right now (00:59:09)

83 ตอน

#Tech #Society #Philosophy #Blue Dot Impact