Abstract

Training frontier-scale foundational models are typically limited to centralized computing clusters with fast interconnects, due to the high communication cost of distributed training. Protocol learning aims to address this and make decentralized training feasible on consumer grade GPUs connected over the internet. In addition to communication efficiency, protocol learning aims to realize volunteer, multi-party training and inference at scale, by addressing research problems such as robustness to malicious actors, and weight unextractability. Read more

Publications

Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism.
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
Preprint, 2025. [pdf] [arxiv] [bib]

Nesterov Method for Asynchronous Pipeline Parallel Optimization.
Thalaiyasingam Ajanthan, Sameera Ramasinghe, Yan Zuo, Gil Avraham, and Alexander Long.
International Conference on Machine Learning (ICML), 2025. [pdf] [arxiv] [code] [bib]

Beyond Top-K: Structured Sparsification for Compression in Pipeline Parallel.
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
ICLR Workshop: Modularity for Collaborative, Decentralized, and Continual Deep Learning, 2025. [pdf] [blog] [bib]

Exploring Asynchronism in SWARM Parallelism.
Yan Zuo, Gil Avraham, Thalaiyasingam Ajanthan, Sameera Ramasinghe, and Alexander Long.
ICLR Workshop: Modularity for Collaborative, Decentralized, and Continual Deep Learning, 2025. [pdf] [blog] [bib]

Momentum Look-Ahead for Asynchronous Distributed Low-Communication Training.
Thalaiyasingam Ajanthan, Sameera Ramasinghe, Gil Avraham, Yan Zuo, and Alexander Long.
ICLR Workshop: Modularity for Collaborative, Decentralized, and Continual Deep Learning, 2025. [pdf] [blog] [bib]