ออฟไลน์ด้วยแอป Player FM !
LW - Principles for the AGI Race by William S
ซีรีส์ที่ถูกเก็บถาวร ("ฟีดที่ไม่ได้ใช้งาน" status)
When? This feed was archived on October 23, 2024 10:10 (). Last successful fetch was on September 22, 2024 16:12 ()
Why? ฟีดที่ไม่ได้ใช้งาน status. เซิร์ฟเวอร์ของเราไม่สามารถดึงฟีดพอดคาสท์ที่ใช้งานได้สักระยะหนึ่ง
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 437328481 series 3337129
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Principles for the AGI Race, published by William S on August 30, 2024 on LessWrong.
Crossposted from https://williamrsaunders.substack.com/p/principles-for-the-agi-race
Why form principles for the AGI Race?
I worked at OpenAI for 3 years, on the Alignment and Superalignment teams. Our goal was to prepare for the possibility that OpenAI succeeded in its stated mission of building AGI (Artificial General Intelligence, roughly able to do most things a human can do), and then proceed on to make systems smarter than most humans. This will predictably face novel problems in controlling and shaping systems smarter than their supervisors and creators, which we don't currently know how to solve.
It's not clear when this will happen, but a number of people would throw around estimates of this happening within a few years.
While there, I would sometimes dream about what would have happened if I'd been a nuclear physicist in the 1940s. I do think that many of the kind of people who get involved in the effective altruism movement would have joined, naive but clever technologists worried about the consequences of a dangerous new technology.
Maybe I would have followed them, and joined the Manhattan Project with the goal of preventing a world where Hitler could threaten the world with a new magnitude of destructive power. The nightmare is that I would have watched the fallout of bombings of Hiroshima and Nagasaki with a growing gnawing panicked horror in the pit of my stomach, knowing that I had some small share of the responsibility.
Maybe, like
Albert Einstein, I would have been unable to join the project due to a history of pacifism. If I had joined, I like to think that I would have joined the ranks of
Joseph Rotblat and resigned once it became clear that Hitler would not get the Atomic Bomb. Or joined the signatories of the
Szilárd petition requesting that the bomb only be used after terms of surrender had been publicly offered to Japan. Maybe I would have done something to try to wake up before the finale of the nightmare.
I don't know what I would have done in a different time and place, facing different threats to the world. But as I've found myself entangled in the ongoing race to build AGI, it feels important to reflect on the lessons to learn from history. I can imagine this alter ego of myself and try to reflect on how I could take right actions in both this counterfactual world and the one I find myself in now.
In particular, what could guide me to the right path even when I'm biased, subtly influenced by the people around me, misinformed, or deliberately manipulated?
Simply trying to pick the action you think will lead to the best consequences for the world fails to capture the ways in which your model of the world is wrong, or your own thinking is corrupt. Joining the Manhattan Project, and using the weapons on Japan both had plausible consequentialist arguments supporting them, ostensibly inviting a lesser horror into the world to prevent a greater one.
Instead I think the best guiding star to follow is reflecting on principles, rules which apply in a variety of possible worlds, including worlds in which you are wrong. Principles that help you gather the right information about the world. Principles that limit the downsides if you're wrong. Principles that help you tell whether you're in a world where racing to build a dangerous technology first is the best path, or you're in a world where it's a hubristic self-delusion.
This matches more with the idea of
rule consequentialism than pure act consequentialism: instead of making each decision based on what you think is best, think about what rules would be good for people to adopt if they were in a similar situation.
My goal in imagining these principles is to find principles that prevent errors of the following forms...
1851 ตอน
ซีรีส์ที่ถูกเก็บถาวร ("ฟีดที่ไม่ได้ใช้งาน" status)
When? This feed was archived on October 23, 2024 10:10 (). Last successful fetch was on September 22, 2024 16:12 ()
Why? ฟีดที่ไม่ได้ใช้งาน status. เซิร์ฟเวอร์ของเราไม่สามารถดึงฟีดพอดคาสท์ที่ใช้งานได้สักระยะหนึ่ง
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 437328481 series 3337129
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Principles for the AGI Race, published by William S on August 30, 2024 on LessWrong.
Crossposted from https://williamrsaunders.substack.com/p/principles-for-the-agi-race
Why form principles for the AGI Race?
I worked at OpenAI for 3 years, on the Alignment and Superalignment teams. Our goal was to prepare for the possibility that OpenAI succeeded in its stated mission of building AGI (Artificial General Intelligence, roughly able to do most things a human can do), and then proceed on to make systems smarter than most humans. This will predictably face novel problems in controlling and shaping systems smarter than their supervisors and creators, which we don't currently know how to solve.
It's not clear when this will happen, but a number of people would throw around estimates of this happening within a few years.
While there, I would sometimes dream about what would have happened if I'd been a nuclear physicist in the 1940s. I do think that many of the kind of people who get involved in the effective altruism movement would have joined, naive but clever technologists worried about the consequences of a dangerous new technology.
Maybe I would have followed them, and joined the Manhattan Project with the goal of preventing a world where Hitler could threaten the world with a new magnitude of destructive power. The nightmare is that I would have watched the fallout of bombings of Hiroshima and Nagasaki with a growing gnawing panicked horror in the pit of my stomach, knowing that I had some small share of the responsibility.
Maybe, like
Albert Einstein, I would have been unable to join the project due to a history of pacifism. If I had joined, I like to think that I would have joined the ranks of
Joseph Rotblat and resigned once it became clear that Hitler would not get the Atomic Bomb. Or joined the signatories of the
Szilárd petition requesting that the bomb only be used after terms of surrender had been publicly offered to Japan. Maybe I would have done something to try to wake up before the finale of the nightmare.
I don't know what I would have done in a different time and place, facing different threats to the world. But as I've found myself entangled in the ongoing race to build AGI, it feels important to reflect on the lessons to learn from history. I can imagine this alter ego of myself and try to reflect on how I could take right actions in both this counterfactual world and the one I find myself in now.
In particular, what could guide me to the right path even when I'm biased, subtly influenced by the people around me, misinformed, or deliberately manipulated?
Simply trying to pick the action you think will lead to the best consequences for the world fails to capture the ways in which your model of the world is wrong, or your own thinking is corrupt. Joining the Manhattan Project, and using the weapons on Japan both had plausible consequentialist arguments supporting them, ostensibly inviting a lesser horror into the world to prevent a greater one.
Instead I think the best guiding star to follow is reflecting on principles, rules which apply in a variety of possible worlds, including worlds in which you are wrong. Principles that help you gather the right information about the world. Principles that limit the downsides if you're wrong. Principles that help you tell whether you're in a world where racing to build a dangerous technology first is the best path, or you're in a world where it's a hubristic self-delusion.
This matches more with the idea of
rule consequentialism than pure act consequentialism: instead of making each decision based on what you think is best, think about what rules would be good for people to adopt if they were in a similar situation.
My goal in imagining these principles is to find principles that prevent errors of the following forms...
1851 ตอน
ทุกตอน
×ขอต้อนรับสู่ Player FM!
Player FM กำลังหาเว็บ