Non-IID Medical Image Segmentation Based on Cascaded Diffusion Model for Diverse Multi- Center Scenarios.
Learning from multi-center medical datasets to obtain a high-performance global model is challenging due to the privacy protection and data heterogeneity in healthcare systems. Current federated learning approaches are not efficient enough to learn Non-Independent and Identically Distributed (Non-IID) data and require high communication costs. In this work, a practical privacy computing framework is proposed to train a Non-IID medical image segmentation model under various multi-center setting in low communication cost. Specifically, an efficient cascaded diffusion model is trained to generate image-mask pairs that have similar distribution to the training data of clients, providing rich labeled data on client side to mitigate heterogeneity. Also, a label construction module is developed to improve the quality of generated image-mask pairs. Moreover, a set of aggregation methods is proposed to achieve global model from data generated from Cascaded Diffusion model for diverse scenarios: CD-Syn, CD-Ens and its extension CD-KD. CD-Syn is a one-shot method that trains segmentation model solely on public generated datasets while CD-Ens and CD-KD maximize the utilization of local original data by an extra communication round of ensemble or knowledge distillation. In this way, the setting of our proposed framework is highly practical, providing multiple aggregation methods which can flexibly adapt to varying demands for efficiency, privacy, and accuracy. We systematically evaluated the effectiveness of our proposed framework on five Non-IID medical datasets and observe 5.38% improvement in Dice score compared with baseline method (FednnU-Net) on average.