'Adam'보다 더 빠른 옵티마이저 등장? by 스탠포드 대학 https://arxiv.org/abs/2305.14342 Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophist arxiv.org 해당 논문의 출처입니다... 2023. 6. 5. 이전 1 다음