PL

Abstract

Training frontier-scale foundational models are typically limited to centralized computing clusters with fast interconnects, due to the high communication cost of distributed training. Protocol learning aims to address this and make decentralized training feasible on consumer grade GPUs connected over the internet. In addition to communication efficiency, protocol learning aims to realize volunteer, multi-party training and inference at scale, by addressing research problems such as robustness to malicious actors, and weight unextractability. Read more

Publications

Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization.
Alexander Long, Chamin P Hewa Koneputugodage, Thalaiyasingam Ajanthan, Yan Zuo, Gil Avraham, Violetta Shevchenko, Hadi Mohaghegh Dolatabadi, and Sameera Ramasinghe.
Neural Information Processing Systems (NeurIPS), December 2025.
[to appear] [bib]

@article{long_upms_neurips25,
  title={Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization}, 
  author={Long, Alexander and P Hewa Koneputugodage, Chamin and Ajanthan, Thalaiyasingam and Zuo, Yan and Avraham, Gil and Shevchenko, Violetta and Mohaghegh Dolatabadi, Hadi and Ramasinghe, Sameera},
  year={2025},
  journal={NeurIPS}
}

Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training.
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Hadi Mohaghegh Dolatabadi, Gil Avraham, Violetta Shevchenko, Yan Zuo, Chamin P Hewa Koneputugodage, and Alexander Long.
Neural Information Processing Systems (NeurIPS), December 2025.
[to appear] [bib]

@article{ramasinghe_mixtures_neurips25,
  title={Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training}, 
  author={Ramasinghe, Sameera and Ajanthan, Thalaiyasingam and Mohaghegh Dolatabadi, Hadi and Avraham, Gil and Shevchenko, Violetta and Zuo, Yan and P Hewa Koneputugodage, Chamin and Long, Alexander},
  year={2025},
  journal={NeurIPS}
}

Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism.
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
Neural Information Processing Systems (NeurIPS), December 2025.
[pdf] [arxiv] [bib]

@article{ramasinghe_protocol_neurips25,
  title={Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism}, 
  author={Ramasinghe, Sameera and Ajanthan, Thalaiyasingam and Avraham, Gil and Zuo, Yan and Long, Alexander},
  year={2025},
  journal={NeurIPS}
}

Nesterov Method for Asynchronous Pipeline Parallel Optimization.
Thalaiyasingam Ajanthan, Sameera Ramasinghe, Yan Zuo, Gil Avraham, and Alexander Long.
International Conference on Machine Learning (ICML), 2025.
[pdf] [arxiv] [code] [bib]

@article{ajanthan_asyncpp_icml25,
  author={Ajanthan, Thalaiyasingam and Ramasinghe, Sameera and Zuo, Yan and Avraham, Gil and Long, Alexander},
  title={Nesterov Method for Asynchronous Pipeline Parallel Optimization},
  journal={ICML},
  year={2025}
}

Beyond Top-K: Structured Sparsification for Compression in Pipeline Parallel.
Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, and Alexander Long.
ICLR Workshop: Modularity for Collaborative, Decentralized, and Continual Deep Learning, 2025.
[pdf] [blog] [bib]

@article{ramasinghe_beyondtopk_iclr25,
  author={Ramasinghe, Sameera and Ajanthan, Thalaiyasingam and Avraham, Gil and Zuo, Yan and Long, Alexander},
  title={Beyond Top-K: Structured Sparsification for Compression in Pipeline Parallel},
  journal={ICLR},
  year={2025}
}

Exploring Asynchronism in SWARM Parallelism.
Yan Zuo, Gil Avraham, Thalaiyasingam Ajanthan, Sameera Ramasinghe, and Alexander Long.
ICLR Workshop: Modularity for Collaborative, Decentralized, and Continual Deep Learning, 2025.
[pdf] [blog] [bib]

@article{zuo_asyncswarm_iclr25,
  author={Zuo, Yan and Avraham, Gil and Ajanthan, Thalaiyasingam and Ramasinghe, Sameera and Long, Alexander},
  title={Exploring Asynchronism in SWARM Parallelism},
  journal={ICLR},
  year={2025}
}

Momentum Look-Ahead for Asynchronous Distributed Low-Communication Training.
Thalaiyasingam Ajanthan, Sameera Ramasinghe, Gil Avraham, Yan Zuo, and Alexander Long.
ICLR Workshop: Modularity for Collaborative, Decentralized, and Continual Deep Learning, 2025.
[pdf] [blog] [bib]

@article{ajanthan_mla_iclr25,
  author={Ajanthan, Thalaiyasingam and Ramasinghe, Sameera and Avraham, Gil and Zuo, Yan and Long, Alexander},
  title={Momentum Look-Ahead for Asynchronous Distributed Low-Communication Training},
  journal={ICLR},
  year={2025}
}

Protocol Learning

Abstract

Publications