Artwork

เนื้อหาจัดทำโดย The Nonlinear Fund เนื้อหาพอดแคสต์ทั้งหมด รวมถึงตอน กราฟิก และคำอธิบายพอดแคสต์ได้รับการอัปโหลดและจัดหาให้โดยตรงจาก The Nonlinear Fund หรือพันธมิตรแพลตฟอร์มพอดแคสต์ของพวกเขา หากคุณเชื่อว่ามีบุคคลอื่นใช้งานที่มีลิขสิทธิ์ของคุณโดยไม่ได้รับอนุญาต คุณสามารถปฏิบัติตามขั้นตอนที่แสดงไว้ที่นี่ https://th.player.fm/legal
Player FM - แอป Podcast
ออฟไลน์ด้วยแอป Player FM !

LW - Singular learning theory: exercises by Zach Furman

30:23
 
แบ่งปัน
 

Manage episode 437239739 series 3337129
เนื้อหาจัดทำโดย The Nonlinear Fund เนื้อหาพอดแคสต์ทั้งหมด รวมถึงตอน กราฟิก และคำอธิบายพอดแคสต์ได้รับการอัปโหลดและจัดหาให้โดยตรงจาก The Nonlinear Fund หรือพันธมิตรแพลตฟอร์มพอดแคสต์ของพวกเขา หากคุณเชื่อว่ามีบุคคลอื่นใช้งานที่มีลิขสิทธิ์ของคุณโดยไม่ได้รับอนุญาต คุณสามารถปฏิบัติตามขั้นตอนที่แสดงไว้ที่นี่ https://th.player.fm/legal
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Singular learning theory: exercises, published by Zach Furman on August 30, 2024 on LessWrong.
Thanks to Jesse Hoogland and George Wang for feedback on these exercises.
In learning singular learning theory (SLT), I found it was often much easier to understand by working through examples, rather than try to work through the (fairly technical) theorems in their full generality. These exercises are an attempt to collect the sorts of examples that I worked through to understand SLT.
Before doing these exercises, you should have read the Distilling Singular Learning Theory (DSLT) sequence, watched the SLT summit YouTube videos, or studied something equivalent. DSLT is a good reference to keep open while solving these problems, perhaps alongside Watanabe's textbook, the Gray Book.
Note that some of these exercises cover the basics, which are well-covered in the above distillations, but some deliver material which will likely be new to you (because it's buried deep in a textbook, because it's only found in adjacent literature, etc).
Exercises are presented mostly in conceptual order: later exercises freely use concepts developed in earlier exercises. Starred (*) exercises are what I consider the most essential exercises, and the ones I recommend you complete first.
1. *The normal distribution, like most classical statistical models, is a regular (i.e. non-singular[1]) statistical model. A univariate normal model with unit variance and mean μR is given by the probability density p(x|μ)=12πexp(12(xμ)2). Assume the true distribution q(x) of the data is realizable by the model: that is, q(x)=p(x|μ0) for some true parameter μ0.
1. Calculate the Fisher information matrix of this model (note that since we have only a single parameter, the FIM will be a 1x1 matrix). Use this to show the model is regular.
2. Write an explicit expression for the KL divergence K(μ) between q(x) and p(x|μ), as a function of the parameter μ. This quantity is sometimes also called the population loss. [See Example 1.1, Gray Book, for the case of a 2D normal distribution]
3. Using K(μ) from b), give an explicit formula for the volume of "almost optimal" parameters, V(ϵ)={μK(μ)<ϵ}φ(μ)dμ, where φ(μ) is the prior distribution. For convenience, let φ(μ) be the improper prior φ(μ)=1.
4. The volume scaling formula for the learning coefficient λ (also known as RLCT[2]) is λ=limϵ0log(V(aϵ)/V(ϵ))log(a) for any a1 [Theorem 7.1, Gray Book]. Using this formula, combined with the expression for V(ϵ) derived in b), calculate the learning coefficient[3]. Given that the model is regular, we expect the learning coefficient to be d2=12; compare your answer.
2. *We can make the normal distribution a singular model by changing the parameterization. Let a cubicly-parameterized normal model be the model p(x|μ)=12πexp(12(xμ3)2). Assume the true parameter is μ0.
1. Show that the cubicly-parameterized normal model is just as expressive as an ordinary normal model: that is, they both can express all univariate normal distributions.
2. Repeat 1a) with this model; calculate the Fisher information matrix to demonstrate that the model is singular, and find which parameters μ are singular.
3. Repeat 1b) - 1d) to calculate the learning coefficient this model, for μ0=0, and for μ00.
Recall that the learning coefficient is a volume scaling exponent, such that V(ϵ)ϵλ [4] as ϵ0. Based on this, interpret your results. How does this make the cubicly-parameterized normal model different from the ordinary normal model?
4. Instead of taking ϵ0 to get the learning coefficient, fix a small but nonzero value for ϵ, such as ϵ=0.01. As we saw from c), the learning coefficient changes discontinuously when μ0=0 - what happens with V(ϵ) as μ0 gets close to zero? What changes if you make ϵ smaller or larger?
Even though the asymptotic learning coefficien...
  continue reading

1829 ตอน

Artwork
iconแบ่งปัน
 
Manage episode 437239739 series 3337129
เนื้อหาจัดทำโดย The Nonlinear Fund เนื้อหาพอดแคสต์ทั้งหมด รวมถึงตอน กราฟิก และคำอธิบายพอดแคสต์ได้รับการอัปโหลดและจัดหาให้โดยตรงจาก The Nonlinear Fund หรือพันธมิตรแพลตฟอร์มพอดแคสต์ของพวกเขา หากคุณเชื่อว่ามีบุคคลอื่นใช้งานที่มีลิขสิทธิ์ของคุณโดยไม่ได้รับอนุญาต คุณสามารถปฏิบัติตามขั้นตอนที่แสดงไว้ที่นี่ https://th.player.fm/legal
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Singular learning theory: exercises, published by Zach Furman on August 30, 2024 on LessWrong.
Thanks to Jesse Hoogland and George Wang for feedback on these exercises.
In learning singular learning theory (SLT), I found it was often much easier to understand by working through examples, rather than try to work through the (fairly technical) theorems in their full generality. These exercises are an attempt to collect the sorts of examples that I worked through to understand SLT.
Before doing these exercises, you should have read the Distilling Singular Learning Theory (DSLT) sequence, watched the SLT summit YouTube videos, or studied something equivalent. DSLT is a good reference to keep open while solving these problems, perhaps alongside Watanabe's textbook, the Gray Book.
Note that some of these exercises cover the basics, which are well-covered in the above distillations, but some deliver material which will likely be new to you (because it's buried deep in a textbook, because it's only found in adjacent literature, etc).
Exercises are presented mostly in conceptual order: later exercises freely use concepts developed in earlier exercises. Starred (*) exercises are what I consider the most essential exercises, and the ones I recommend you complete first.
1. *The normal distribution, like most classical statistical models, is a regular (i.e. non-singular[1]) statistical model. A univariate normal model with unit variance and mean μR is given by the probability density p(x|μ)=12πexp(12(xμ)2). Assume the true distribution q(x) of the data is realizable by the model: that is, q(x)=p(x|μ0) for some true parameter μ0.
1. Calculate the Fisher information matrix of this model (note that since we have only a single parameter, the FIM will be a 1x1 matrix). Use this to show the model is regular.
2. Write an explicit expression for the KL divergence K(μ) between q(x) and p(x|μ), as a function of the parameter μ. This quantity is sometimes also called the population loss. [See Example 1.1, Gray Book, for the case of a 2D normal distribution]
3. Using K(μ) from b), give an explicit formula for the volume of "almost optimal" parameters, V(ϵ)={μK(μ)<ϵ}φ(μ)dμ, where φ(μ) is the prior distribution. For convenience, let φ(μ) be the improper prior φ(μ)=1.
4. The volume scaling formula for the learning coefficient λ (also known as RLCT[2]) is λ=limϵ0log(V(aϵ)/V(ϵ))log(a) for any a1 [Theorem 7.1, Gray Book]. Using this formula, combined with the expression for V(ϵ) derived in b), calculate the learning coefficient[3]. Given that the model is regular, we expect the learning coefficient to be d2=12; compare your answer.
2. *We can make the normal distribution a singular model by changing the parameterization. Let a cubicly-parameterized normal model be the model p(x|μ)=12πexp(12(xμ3)2). Assume the true parameter is μ0.
1. Show that the cubicly-parameterized normal model is just as expressive as an ordinary normal model: that is, they both can express all univariate normal distributions.
2. Repeat 1a) with this model; calculate the Fisher information matrix to demonstrate that the model is singular, and find which parameters μ are singular.
3. Repeat 1b) - 1d) to calculate the learning coefficient this model, for μ0=0, and for μ00.
Recall that the learning coefficient is a volume scaling exponent, such that V(ϵ)ϵλ [4] as ϵ0. Based on this, interpret your results. How does this make the cubicly-parameterized normal model different from the ordinary normal model?
4. Instead of taking ϵ0 to get the learning coefficient, fix a small but nonzero value for ϵ, such as ϵ=0.01. As we saw from c), the learning coefficient changes discontinuously when μ0=0 - what happens with V(ϵ) as μ0 gets close to zero? What changes if you make ϵ smaller or larger?
Even though the asymptotic learning coefficien...
  continue reading

1829 ตอน

All episodes

×
 
Loading …

ขอต้อนรับสู่ Player FM!

Player FM กำลังหาเว็บ

 

คู่มืออ้างอิงด่วน