Ruojin Yan, Chunmei Fan, Shen Gu, Tingzhang Wang, Zi Yin, Xiao Chen. Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT[J]. Protein&Cell. doi: 10.1093/procel/pwaf001
Citation: Ruojin Yan, Chunmei Fan, Shen Gu, Tingzhang Wang, Zi Yin, Xiao Chen. Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT[J]. Protein&Cell. doi: 10.1093/procel/pwaf001

Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT

  • Identification of disease-specific cell subtypes (DSCSs) has profound implications for understanding disease mechanisms, preoperative diagnosis, and precision therapy. However, achieving unified annotation of DSCSs in heterogeneous single-cell datasets remains a challenge. In this study, we developed the gPRINT algorithm (generalized approach for cell subtype Identification with single cell’s voicePRINT). Inspired by the principles of speech recognition in noisy environments, gPRINT transforms gene position and gene expression information into voiceprints based on ordered and clustered gene expression phenomena, obtaining unique “gene print” patterns for each cell. Then, we integrated neural networks to mitigate the impact of background noise on cell identity label mapping. We demonstrated the reproducibility of gPRINT across different donors, single-cell sequencing platforms, and disease subtypes, and its utility for automatic cell subtype annotation across datasets. Moreover, gPRINT achieved higher annotation accuracy of 98.37% when externally validated based on the same tissue, surpassing other algorithms. Furthermore, this approach has been applied to fibrosis-associated diseases in multiple tissues throughout the body, as well as to the annotation of fibroblast subtypes in a single tissue, tendon, where fibrosis is prevalent. We successfully achieved automatic prediction of tendinopathy-specific cell subtypes, key targets, and related drugs. In summary, gPRINT provides an automated and unified approach for identifying DSCSs across datasets, facilitating the elucidation of specific cell subtypes under different disease states and providing a powerful tool for exploring therapeutic targets in diseases.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return