Effects of Momentum in Implicit Bias of Gradient Flow for Diagonal Linear Networks

Lyu, B., Wang, H., Wang, Z. orcid.org/0000-0001-6157-0662 et al. (1 more author) (2025) Effects of Momentum in Implicit Bias of Gradient Flow for Diagonal Linear Networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. Thirty-Ninth AAAI Conference on Artificial Intelligence Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence, 25 Feb - 04 Mar 2025, Philadelphia, USA. AAAI Press, pp. 19242-19250. ISSN: 2159-5399 EISSN: 2374-3468

Abstract

This paper targets on the regularization effect of momentum-based methods in regression settings and analyzes the popular diagonal linear networks to precisely characterize the implicit bias of continuous versions of heavy-ball (HB) and Nesterov's method of accelerated gradients (NAG). We show that, HB and NAG exhibit different implicit bias compared to GD for diagonal linear networks, which is different from the one for classic linear regression problem where momentum-based methods share the same implicit bias with GD. Specifically, the role of momentum in the implicit bias of GD is twofold: (a) HB and NAG induce extra initialization mitigation effects similar to SGD that are beneficial for generalization of sparse regression; (b) the implicit regularization effects of HB and NAG also depend on the initialization of gradients explicitly, which may not be benign for generalization. As a result, whether HB and NAG have better generalization properties than GD jointly depends on the aforementioned twofold effects determined by various parameters such as learning rate, momentum factor, and integral of gradients. Our findings highlight the potential beneficial role of momentum and can help understand its advantages in practice such as when it will lead to better generalization performance.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Lyu, B. Wang, H. Wang, Z. https://orcid.org/0000-0001-6157-0662 Zhu, Z.
Copyright, Publisher and Additional Information:	© 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. This is an author produced version of a conference paper published in Proceedings of Thirty-Ninth AAAI Conference on Artificial Intelligence. Uploaded in accordance with the publisher's self-archiving policy.
Dates:	Published: 11 April 2025
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds)
Depositing User:	Symplectic Publications
Date Deposited:	03 Sep 2025 10:39
Last Modified:	04 Sep 2025 09:43
Status:	Published
Publisher:	AAAI Press
Identification Number:	10.1609/aaai.v39i18.34118
Related URLs:	Author
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:231137

CORE (COnnecting REpositories)

Effects of Momentum in Implicit Bias of Gradient Flow for Diagonal Linear Networks

Abstract

Metadata

Download

Accepted Version

Export

Statistics