Artwork

เนื้อหาจัดทำโดย The Nonlinear Fund เนื้อหาพอดแคสต์ทั้งหมด รวมถึงตอน กราฟิก และคำอธิบายพอดแคสต์ได้รับการอัปโหลดและจัดหาให้โดยตรงจาก The Nonlinear Fund หรือพันธมิตรแพลตฟอร์มพอดแคสต์ของพวกเขา หากคุณเชื่อว่ามีบุคคลอื่นใช้งานที่มีลิขสิทธิ์ของคุณโดยไม่ได้รับอนุญาต คุณสามารถปฏิบัติตามขั้นตอนที่แสดงไว้ที่นี่ https://th.player.fm/legal
Player FM - แอป Podcast
ออฟไลน์ด้วยแอป Player FM !

LW - If we solve alignment, do we die anyway? by Seth Herd

5:30
 
แบ่งปัน
 

Manage episode 435817611 series 3337129
เนื้อหาจัดทำโดย The Nonlinear Fund เนื้อหาพอดแคสต์ทั้งหมด รวมถึงตอน กราฟิก และคำอธิบายพอดแคสต์ได้รับการอัปโหลดและจัดหาให้โดยตรงจาก The Nonlinear Fund หรือพันธมิตรแพลตฟอร์มพอดแคสต์ของพวกเขา หากคุณเชื่อว่ามีบุคคลอื่นใช้งานที่มีลิขสิทธิ์ของคุณโดยไม่ได้รับอนุญาต คุณสามารถปฏิบัติตามขั้นตอนที่แสดงไว้ที่นี่ https://th.player.fm/legal
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: If we solve alignment, do we die anyway?, published by Seth Herd on August 23, 2024 on LessWrong.
I'm aware of good arguments that this scenario isn't inevitable, but it still seems frighteningly likely even if we solve technical alignment.
TL;DR:
1. If we solve alignment, it will probably be used to create AGI that follows human orders.
2. If takeoff is slow-ish, a pivotal act (preventing more AGIs from being developed) will be difficult.
3. If no pivotal act is performed, RSI-capable AGI proliferates. This creates an n-way non-iterated Prisoner's Dilemma where the first to attack, wins.
4. Disaster results.
The first AGIs will probably be aligned to take orders
People in charge of AGI projects like power. And by definition, they like their values somewhat better than the aggregate values of all of humanity. It also seems like there's a pretty strong argument that Instruction-following AGI is easier than value aligned AGI. In the slow-ish takeoff we expect, this alignment target seems to allow for error-correcting alignment, in somewhat non-obvious ways.
If this argument holds up even weakly, it will be an excuse for the people in charge to do what they want to anyway.
I hope I'm wrong and value-aligned AGI is just as easy and likely. But it seems like wishful thinking at this point.
The first AGI probably won't perform a pivotal act
In realistically slow takeoff scenarios, the AGI won't be able to do anything like make nanobots to melt down GPUs. It would have to use more conventional methods, like software intrusion to sabotage existing projects, followed by elaborate monitoring to prevent new ones. Such a weak attempted pivotal act could fail, or could escalate to a nuclear conflict.
Second, the humans in charge of AGI may not have the chutzpah to even try such a thing. Taking over the world is not for the faint of heart. They might get it after their increasingly-intelligent AGI carefully explains to them the consequences of allowing AGI proliferation, or they might not. If the people in charge are a government, the odds of such an action go up, but so do the risks of escalation to nuclear war. Governments seem to be fairly risk-taking.
Expecting governments to not just grab world-changing power while they can seems naive, so this is my median scenario.
So RSI-capable AGI may proliferate until a disaster occurs
If we solve alignment and create personal intent aligned AGI but nobody manages a pivotal act, I see a likely future world with an increasing number of AGIs capable of recursively self-improving. How long until someone tells their AGI to hide, self-improve, and take over?
Many people seem optimistic about this scenario. Perhaps network security can be improved with AGIs on the job. But AGIs can do an end-run around the entire system: hide, set up self-replicating manufacturing (robotics is rapidly improving to allow this), use that to recursively self-improve your intelligence, and develop new offensive strategies and capabilities until you've got one that will work within an acceptable level of viciousness.[1]
If hiding in factories isn't good enough, do your RSI manufacturing underground. If that's not good enough, do it as far from Earth as necessary. Take over with as little violence as you can manage or as much as you need. Reboot a new civilization if that's all you can manage while still acting before someone else does.
The first one to pull the stops probably wins. This looks all too much like a non-iterated Prisoner's Dilemma with N players - and N increasing.
Counterarguments/Outs
For small numbers of AGI and similar values among their wielders, a collective pivotal act could be performed. I place some hopes here, particularly if political pressure is applied in advance to aim for this outcome, or if the AGIs come up with better cooperation stru...
  continue reading

1829 ตอน

Artwork
iconแบ่งปัน
 
Manage episode 435817611 series 3337129
เนื้อหาจัดทำโดย The Nonlinear Fund เนื้อหาพอดแคสต์ทั้งหมด รวมถึงตอน กราฟิก และคำอธิบายพอดแคสต์ได้รับการอัปโหลดและจัดหาให้โดยตรงจาก The Nonlinear Fund หรือพันธมิตรแพลตฟอร์มพอดแคสต์ของพวกเขา หากคุณเชื่อว่ามีบุคคลอื่นใช้งานที่มีลิขสิทธิ์ของคุณโดยไม่ได้รับอนุญาต คุณสามารถปฏิบัติตามขั้นตอนที่แสดงไว้ที่นี่ https://th.player.fm/legal
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: If we solve alignment, do we die anyway?, published by Seth Herd on August 23, 2024 on LessWrong.
I'm aware of good arguments that this scenario isn't inevitable, but it still seems frighteningly likely even if we solve technical alignment.
TL;DR:
1. If we solve alignment, it will probably be used to create AGI that follows human orders.
2. If takeoff is slow-ish, a pivotal act (preventing more AGIs from being developed) will be difficult.
3. If no pivotal act is performed, RSI-capable AGI proliferates. This creates an n-way non-iterated Prisoner's Dilemma where the first to attack, wins.
4. Disaster results.
The first AGIs will probably be aligned to take orders
People in charge of AGI projects like power. And by definition, they like their values somewhat better than the aggregate values of all of humanity. It also seems like there's a pretty strong argument that Instruction-following AGI is easier than value aligned AGI. In the slow-ish takeoff we expect, this alignment target seems to allow for error-correcting alignment, in somewhat non-obvious ways.
If this argument holds up even weakly, it will be an excuse for the people in charge to do what they want to anyway.
I hope I'm wrong and value-aligned AGI is just as easy and likely. But it seems like wishful thinking at this point.
The first AGI probably won't perform a pivotal act
In realistically slow takeoff scenarios, the AGI won't be able to do anything like make nanobots to melt down GPUs. It would have to use more conventional methods, like software intrusion to sabotage existing projects, followed by elaborate monitoring to prevent new ones. Such a weak attempted pivotal act could fail, or could escalate to a nuclear conflict.
Second, the humans in charge of AGI may not have the chutzpah to even try such a thing. Taking over the world is not for the faint of heart. They might get it after their increasingly-intelligent AGI carefully explains to them the consequences of allowing AGI proliferation, or they might not. If the people in charge are a government, the odds of such an action go up, but so do the risks of escalation to nuclear war. Governments seem to be fairly risk-taking.
Expecting governments to not just grab world-changing power while they can seems naive, so this is my median scenario.
So RSI-capable AGI may proliferate until a disaster occurs
If we solve alignment and create personal intent aligned AGI but nobody manages a pivotal act, I see a likely future world with an increasing number of AGIs capable of recursively self-improving. How long until someone tells their AGI to hide, self-improve, and take over?
Many people seem optimistic about this scenario. Perhaps network security can be improved with AGIs on the job. But AGIs can do an end-run around the entire system: hide, set up self-replicating manufacturing (robotics is rapidly improving to allow this), use that to recursively self-improve your intelligence, and develop new offensive strategies and capabilities until you've got one that will work within an acceptable level of viciousness.[1]
If hiding in factories isn't good enough, do your RSI manufacturing underground. If that's not good enough, do it as far from Earth as necessary. Take over with as little violence as you can manage or as much as you need. Reboot a new civilization if that's all you can manage while still acting before someone else does.
The first one to pull the stops probably wins. This looks all too much like a non-iterated Prisoner's Dilemma with N players - and N increasing.
Counterarguments/Outs
For small numbers of AGI and similar values among their wielders, a collective pivotal act could be performed. I place some hopes here, particularly if political pressure is applied in advance to aim for this outcome, or if the AGIs come up with better cooperation stru...
  continue reading

1829 ตอน

All episodes

×
 
Loading …

ขอต้อนรับสู่ Player FM!

Player FM กำลังหาเว็บ

 

คู่มืออ้างอิงด่วน