TY - GEN
T1 - Nested Distributed Gradient Methods with Stochastic Computation Errors
AU - Iakovidou, Charikleia
AU - Wei, Ermin
N1 - Funding Information:
*This work was supported by the DARPA award HR-001117S0039.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - In this work, we consider the problem of a network of agents collectively minimizing a sum of convex functions. The agents in our setting can only access their local objective functions and exchange information with their immediate neighbors. Motivated by applications where computation is imperfect, including, but not limited to, empirical risk minimization (ERM) and online learning, we assume that only noisy estimates of the local gradients are available. To tackle this problem, we adapt a class of Nested Distributed Gradient methods (NEAR-DGD) to the stochastic gradient setting. These methods have minimal storage requirements, are communication aware and perform well in settings where gradient computation is costly, while communication is relatively inexpensive. We investigate the convergence properties of our method under standard assumptions for stochastic gradients, i.e. unbiasedness and bounded variance. Our analysis indicates that our method converges to a neighborhood of the optimal solution with a linear rate for local strongly convex functions and appropriate constant steplengths. We also show that distributed optimization with stochastic gradients achieves a noise reduction effect similar to mini-batching, which scales favorably with network size. Finally, we present numerical results to demonstrate the effectiveness of our method.
AB - In this work, we consider the problem of a network of agents collectively minimizing a sum of convex functions. The agents in our setting can only access their local objective functions and exchange information with their immediate neighbors. Motivated by applications where computation is imperfect, including, but not limited to, empirical risk minimization (ERM) and online learning, we assume that only noisy estimates of the local gradients are available. To tackle this problem, we adapt a class of Nested Distributed Gradient methods (NEAR-DGD) to the stochastic gradient setting. These methods have minimal storage requirements, are communication aware and perform well in settings where gradient computation is costly, while communication is relatively inexpensive. We investigate the convergence properties of our method under standard assumptions for stochastic gradients, i.e. unbiasedness and bounded variance. Our analysis indicates that our method converges to a neighborhood of the optimal solution with a linear rate for local strongly convex functions and appropriate constant steplengths. We also show that distributed optimization with stochastic gradients achieves a noise reduction effect similar to mini-batching, which scales favorably with network size. Finally, we present numerical results to demonstrate the effectiveness of our method.
UR - http://www.scopus.com/inward/record.url?scp=85077798722&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077798722&partnerID=8YFLogxK
U2 - 10.1109/ALLERTON.2019.8919853
DO - 10.1109/ALLERTON.2019.8919853
M3 - Conference contribution
AN - SCOPUS:85077798722
T3 - 2019 57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019
SP - 339
EP - 346
BT - 2019 57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019
Y2 - 24 September 2019 through 27 September 2019
ER -