AdaGCL+: An Adaptive Subgraph Contrastive Learning Towards Tackling Topological Bias.

Journal: IEEE Transactions On Pattern Analysis And Machine Intelligence
Published:
Abstract

Large-scale graph data poses a training scalability challenge, which is generally treated by employing batch sampling methods to divide the graph into smaller subgraphs and train them in batches. However, such an approach introduces a topological bias in the local batches compared with the complete graph structure, missing either node features or edges. This topological bias is empirically shown to affect the generalization capabilities of graph neural networks (GNNs). To address this issue, we propose adaptive subgraph contrastive learning (AdaGCL) that bridges the gap between large-scale batch sampling and its generalization poorness. Specifically, AdaGCL augments graphs depending on the sampled batches and leverages a subgraph-granularity contrastive loss to learn the node embeddings invariant among the augmented imperfect graphs. To optimize the augmentation strategy for each downstream application, we introduce a node-centric information bottleneck(Node-IB) to control the trade-off regarding the similarity and diversity between the original and augmented graphs. This enhanced version of AdaGCL referred to as AdaGCL+, automates the graph augmentation process by dynamically adjusting graph perturbation parameters (e.g., edge dropping rate) to minimize the downstream loss. Extensive experimental results showcase the scalability of AdaGCL+ to graphs with millions of nodes using batch sampling methods. AdaGCL+ consistently outperforms existing methods on numerous benchmark datasets in terms of node classification accuracy and runtime efficiency. The code is in: AdaGCL.

Authors
Yili Wang, Yaohua Liu, Ninghao Liu, Rui Miao, Ying Wang, Xin Wang

Similar Publications