Abstract
Training frontier-scale foundational models are typically limited to centralized computing clusters with fast interconnects, due to the high communication cost of distributed training. Protocol learning aims to address this and make decentralized training feasible on consumer grade GPUs connected over the internet. In addition to communication efficiency, protocol learning aims to realize volunteer, multi-party training and inference at scale, by addressing research problems such as robustness to malicious actors, and weight unextractability. Read more
Publications
Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization.
Alexander Long, Chamin P Hewa Koneputugodage, Thalaiyasingam Ajanthan, Yan Zuo, Gil Avraham, Violetta Shevchenko, Hadi Mohaghegh Dolatabadi, and Sameera Ramasinghe.
Neural Information Processing Systems (NeurIPS), December 2025.
[to appear] [bib]
Alexander Long, Chamin P Hewa Koneputugodage, Thalaiyasingam Ajanthan, Yan Zuo, Gil Avraham, Violetta Shevchenko, Hadi Mohaghegh Dolatabadi, and Sameera Ramasinghe.
Neural Information Processing Systems (NeurIPS), December 2025.
[to appear] [bib]
Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training.
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Hadi Mohaghegh Dolatabadi, Gil Avraham, Violetta Shevchenko, Yan Zuo, Chamin P Hewa Koneputugodage, and Alexander Long.
Neural Information Processing Systems (NeurIPS), December 2025.
[to appear] [bib]
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Hadi Mohaghegh Dolatabadi, Gil Avraham, Violetta Shevchenko, Yan Zuo, Chamin P Hewa Koneputugodage, and Alexander Long.
Neural Information Processing Systems (NeurIPS), December 2025.
[to appear] [bib]
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism.
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
Neural Information Processing Systems (NeurIPS), December 2025.
[pdf] [arxiv] [bib]
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
Neural Information Processing Systems (NeurIPS), December 2025.
[pdf] [arxiv] [bib]
Nesterov Method for Asynchronous Pipeline Parallel Optimization.
Thalaiyasingam Ajanthan, Sameera Ramasinghe, Yan Zuo, Gil Avraham, and Alexander Long.
International Conference on Machine Learning (ICML), 2025.
[pdf] [arxiv] [code] [bib]
Thalaiyasingam Ajanthan, Sameera Ramasinghe, Yan Zuo, Gil Avraham, and Alexander Long.
International Conference on Machine Learning (ICML), 2025.
[pdf] [arxiv] [code] [bib]
Beyond Top-K: Structured Sparsification for Compression in Pipeline Parallel.
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
ICLR Workshop: Modularity for Collaborative, Decentralized, and Continual Deep Learning, 2025.
[pdf] [blog] [bib]
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
ICLR Workshop: Modularity for Collaborative, Decentralized, and Continual Deep Learning, 2025.
[pdf] [blog] [bib]