PlantPan: A comprehensive multi-species plant pan-genome database.

Journal: The Plant Journal : For Cell And Molecular Biology
Published:
Abstract

The pan-genome represents the complete genomic diversity of specific species, serving as a valuable resource for studying species evolution, crop domestication, and guiding crop breeding and improvement. While there are several single-species-specific plant pan-genome databases, the availability of multi-species pan-genome databases is limited. Additionally, variations in methods and data types used for plant pan-genome analysis across different databases hinder the comparison and integration of pan-genome information from various projects at multi-species or single-species levels. To tackle this challenge, we introduce PlantPan, a comprehensive database housing the results of pan-genome analysis for 195 genomes from 11 plant species. PlantPan aims to provide extensive information, including gene-centric and sequence-centric pan-genome information, graph-based pan-genome, pan-genome openness profiles, gene functions and its variation characteristics, homologous genes, and gene clusters across different species. Statistically, PlantPan incorporates 9 163 011 genes, 694 191 gene clusters, 526 973 370 genome variations, and 1 616 089 non-redundant genome variation groups at the species level, 33 455,098 genome synteny, and 177 827 non-redundant genome synteny groups at the species level. Regarding functional genes, PlantPan contains 5 222 720 genes related to transcription factors, 395 247 literature-reported resistance genes, 455 748 predicted microbial/disease resistance genes, and 1 612 112 genes related to molecular pathways. In summary, PlantPan is a vital platform for advancing the application of pan-genomes in molecular breeding for crops and evolutionary research for plants.

Authors
Meiye Jiang, Qiheng Qian, Mingming Lu, Meili Chen, Zhuojing Fan, Yunfei Shang, Congfan Bu, Zhenglin Du, Shuhui Song, Jingyao Zeng, Jingfa Xiao