Abstract
Training frontier-scale foundational models are typically limited to centralized computing clusters with fast interconnects, due to the high communication cost of distributed training. Protocol learning aims to address this and make decentralized training feasible on consumer grade GPUs connected over the internet. In addition to communication efficiency, protocol learning aims to realize volunteer, multi-party training and inference at scale, by addressing research problems such as robustness to malicious actors, and weight unextractability. Read more
Publications
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism.
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
Preprint, 2025. [pdf] [arxiv] [bib]
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
Preprint, 2025. [pdf] [arxiv] [bib]
Nesterov Method for Asynchronous Pipeline Parallel Optimization.
Thalaiyasingam Ajanthan, Sameera Ramasinghe, Yan Zuo, Gil Avraham, and Alexander Long.
International Conference on Machine Learning (ICML), 2025. [pdf] [arxiv] [code] [bib]
Thalaiyasingam Ajanthan, Sameera Ramasinghe, Yan Zuo, Gil Avraham, and Alexander Long.
International Conference on Machine Learning (ICML), 2025. [pdf] [arxiv] [code] [bib]
Beyond Top-K: Structured Sparsification for Compression in Pipeline Parallel.
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
ICLR Workshop: Modularity for Collaborative, Decentralized, and Continual Deep Learning, 2025. [pdf] [blog] [bib]
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
ICLR Workshop: Modularity for Collaborative, Decentralized, and Continual Deep Learning, 2025. [pdf] [blog] [bib]