We Need a Science of Evals
ซีรีส์ที่ถูกเก็บถาวร ("ฟีดที่ไม่ได้ใช้งาน" status)
When? This feed was archived on February 21, 2025 21:08 (
Why? ฟีดที่ไม่ได้ใช้งาน status. เซิร์ฟเวอร์ของเราไม่สามารถดึงฟีดพอดคาสท์ที่ใช้งานได้สักระยะหนึ่ง
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 458945498 series 3498845
This lays out a number of open questions, in what the author calls a 'Science of Evals'.
Original text: https://www.apolloresearch.ai/blog/we-need-a-science-of-evals
Author(s): Apollo Research blog
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
บท
1. We Need a Science of Evals (00:00:00)
2. What do we mean by “Science of Evals”? (00:04:58)
3. Maturation process of a field (00:06:41)
4. Current work in the direction of Science of Evals (00:10:23)
5. Next steps (00:14:01)
6. Field building (00:14:10)
7. Open research questions (00:16:11)
8. Conclusion (00:19:20)
85 ตอน