Enhancing Adversarial Defense via Brain Activity Integration Without Adversarial Examples.

Journal: Sensors (Basel, Switzerland)
Published:
Abstract

Adversarial attacks on large-scale vision-language foundation models, such as the contrastive language-image pretraining (CLIP) model, can significantly degrade performance across various tasks by generating adversarial examples that are indistinguishable from the original images to human perception. Although adversarial training methods, which train models with adversarial examples, have been proposed to defend against such attacks, they typically require prior knowledge of the attack. These methods also lead to a trade-off between robustness to adversarial examples and accuracy for clean images. To address these challenges, we propose an adversarial defense method based on human brain activity data by hypothesizing that such adversarial examples are not misrecognized by humans. The proposed method employs an encoder that integrates the features of brain activity and augmented images from the original images. Then, by maximizing the similarity between features predicted by the encoder and the original visual features, we obtain features with the visual invariance of the human brain and the diversity of data augmentation. Consequently, we construct a model that is robust against adversarial attacks and maintains accuracy for clean images. Unlike existing methods, the proposed method is not trained on any specific adversarial attack information; thus, it is robust against unknown attacks. Extensive experiments demonstrate that the proposed method significantly enhances robustness to adversarial attacks on the CLIP model without degrading accuracy for clean images. The primary contribution of this study is that the performance trade-off can be overcome using brain activity data.