Nature Biotechnology ( IF 33.1 ) Pub Date : 2025-06-03 , DOI: 10.1038/s41587-025-02633-9
Haoran Li, Dingjie Wang, Qi Gao, Puwen Tan, Yunhao Wang, Xiaoyu Cai, Aifu Li, Yue Zhao, Andrew L. Thurman, Seyed Amir Malekpour, Ying Zhang, Roberta Sala, Andrea Cipriano, Chia-Lin Wei, Vittorio Sebastiano, Chi Song, Nancy R. Zhang, Kin Fai Au
|
RNA sequencing has been widely applied for gene isoform quantification, but limitations exist in quantifying isoforms of complex genes accurately, especially for short reads. Here we identify genes that are difficult to quantify accurately with short reads and illustrate the information benefit of using long reads to quantify these regions. We present miniQuant, which ranks genes with quantification errors caused by the ambiguity of read alignments and integrates the complementary strengths of long reads and short reads with optimal combination in a gene- and data-specific manner to achieve more accurate quantification. These results are supported by rigorous mathematical proofs, validated with a wide range of simulation data, experimental validations and more than 17,000 public datasets from GTEx, TCGA and ENCODE consortia. We demonstrate miniQuant can uncover isoform switches during the differentiation of human embryonic stem cells to pharyngeal endoderm and primordial germ cell-like cells.
中文翻译:

使用 miniQuant 改进基因亚型定量
RNA 测序已广泛用于基因亚型定量,但在准确定量复杂基因的亚型方面存在局限性,尤其是对于短读长。在这里,我们确定了难以用短读长准确定量的基因,并说明了使用长读长来量化这些区域的信息优势。我们提出了 miniQuant,它对因读取比对的模糊性而导致的定量误差的基因进行排名,并以基因和数据特异性方式将长读取和短读取的互补优势与最佳组合相结合,以实现更准确的定量。这些结果得到了严格的数学证明的支持,并通过各种仿真数据、实验验证以及来自 GTEx、TCGA 和 ENCODE 联盟的 17,000 多个公共数据集进行了验证。我们证明 miniQuant 可以揭示人胚胎干细胞分化为咽内胚层和原始生殖细胞样细胞过程中的亚型开关。