Improving Representation of High-frequency Components for Medical Visual Foundation Models.

Journal: IEEE Transactions On Medical Imaging
Published:
Abstract

Foundation models have attracted significant attention for their impressive generalizability across diverse downstream tasks. However, they are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, precise representation of such information is crucial due to the inherently intricate anatomical structures, sub-visual features, and complex boundaries involved. Consequently, the limited representation of prevalent foundation models can result in considerable performance degradation or even failure in these tasks. To address these challenges, we propose a novel pretraining strategy for both 2D images and 3D volumes, named Frequency-advanced Representation Autoencoder (Frepa). Through high-frequency masking and low-frequency perturbation combined with embedding consistency learning, Frepa encourages the encoder to effectively represent and preserve high-frequency components in the image embeddings. Additionally, we introduce an innovative histogram-equalized image masking strategy, extending the Masked Autoencoder approach beyond ViT to other architectures such as Swin-Transformer and convolutional networks. We develop Frepa across nine medical modalities and validate it on 32 downstream tasks for both 2D images and 3D volumes. Without fine-tuning, Frepa can outperform other self-supervised pretraining methods and, in some cases, even surpasses task-specific foundation models. This improvement is particularly significant for tasks involving fine-grained details, such as achieving up to a +15% increase in dice score for retina vessel segmentation and a +8% increase in IoU for lung tumor detection. Further experiment quantitatively reveals that Frepa enables superior high-frequency representations and preservation in the embeddings, underscoring its potential for developing more generalized and universal medical image foundation models.

Authors
Yuetan Chu, Yilan Zhang, Zhongyi Han, Changchun Yang, Longxi Zhou, Gongning Luo, Chao Huang, Xin Gao