ออฟไลน์ด้วยแอป Player FM !
The AdEMAMix Optimizer: Better, Faster, Older
Manage episode 438438307 series 3524393
This paper critiques single EMA usage in momentum optimizers, proposing AdEMAMix, which combines two EMAs for improved gradient relevance, faster convergence, and reduced model forgetting in training.
https://arxiv.org/abs//2409.03137
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1645 ตอน
Manage episode 438438307 series 3524393
This paper critiques single EMA usage in momentum optimizers, proposing AdEMAMix, which combines two EMAs for improved gradient relevance, faster convergence, and reduced model forgetting in training.
https://arxiv.org/abs//2409.03137
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1645 ตอน
Alle episoder
×ขอต้อนรับสู่ Player FM!
Player FM กำลังหาเว็บ