Effective and Efficient Distributed Temporal Graph Learning through Hotspot Memory Sharing.

Published in PVLDB, 2025

Memory-based temporal graph neural network (MTGNN) models are effective for predicting temporal graphs by using node memory and message-passing modules to capture temporal and structural information, respectively. However, distributed training for large graphs presents challenges such as accuracy loss and decreased efficiency due to remote features and memory transmission. Despite improvements in MTGNN system optimizations, issues like dynamic load imbalances, communication overhead, and memory staleness persist. To tackle these challenges, we introduce MemShare, a distributed MTGNN system. MemShare introduces a novel shared node memory paradigm that utilizes a small subset of shared nodes across machines and GPUs to reduce distributed communication for memory management. It incorporates techniques like shared nodes-centric graph partitioning, shared nodes-aware boundary decay sampling, and shared nodes-targeted synchronous smoothing aggregation. Experiments show that MemShare outperforms existing distributed MTGNN systems in accuracy and training efficiency. Our code will be publicly available.

Recommended citation: Longjiao Zhang, Rui Wang, Tongya Zheng, Ziqi Huang, Wenjie Huang, Xinyu Wang, Can Wang, Mingli Song, Sai Wu, Shuibing He, "Effective and Efficient Distributed Temporal Graph Learning through Hotspot Memory Sharing." the 51st International Conference on Very Large Data Bases (PVLDB 2025), London, United Kingdom, September 1-5, 2025