当前位置: X-MOL 学术Proc. Natl. Acad. Sci. U.S.A. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Study design and the sampling of deleterious rare variants in biobank-scale datasets
Proceedings of the National Academy of Sciences of the United States of America ( IF 9.4 ) Pub Date : 2025-06-03 , DOI: 10.1073/pnas.2425196122
Margaret C. Steiner, Daniel P. Rice, Arjun Biddanda, Mariadaria K. Ianni-Ravn, Christian Porras, John Novembre

One key component of study design in population genetics is the “geographic breadth” of a sample (i.e., how broad a region across which individuals are sampled). How the geographic breadth of a sample impacts observations of rare, deleterious variants is unclear, even though such variants are of particular interest for biomedical and evolutionary applications. Here, in order to gain insight into the effects of sample design on ascertained genetic variants, we formulate a stochastic model of dispersal, genetic drift, selection, mutation, and geographically concentrated sampling. We use this model to understand the effects of the geographic breadth of sampling effort on the discovery of negatively selected variants. We find that samples which are more geographically broad will discover a greater number of variants as compared to geographically narrow samples (an effect we label “discovery”); though the variants will be detected at lower average frequency than in narrow samples (e.g., as singletons, an effect we label “dilution”). Importantly, these effects are amplified for larger sample sizes and fitness effects. We validate these results using both population genetic simulations and empirical analyses in the UK Biobank. Our results are particularly important in two contexts: the association of large-effect rare variants with particular phenotypes and the inference of negative selection from allele frequency data. Overall, our findings emphasize the importance of considering geographic breadth when designing and carrying out genetic studies, especially at biobank scale.

中文翻译:

生物样本库规模数据集中有害罕见变异的研究设计和采样

群体遗传学研究设计的一个关键组成部分是样本的“地理广度”(即对个体进行采样的区域范围)。样本的地理广度如何影响对罕见、有害变异的观察尚不清楚,尽管这些变异对生物医学和进化应用特别感兴趣。在这里,为了深入了解样本设计对确定的遗传变异的影响,我们制定了一个扩散、遗传漂变、选择、突变和地理集中采样的随机模型。我们使用此模型来了解抽样工作的地理广度对发现负选变体的影响。我们发现,与地理范围较窄的样本相比,地理范围更广的样本将发现更多的变体(我们将这种效应称为“发现”);尽管变异的平均检测频率低于窄样本(例如,作为单例,我们称之为“稀释”的效应)。重要的是,这些效应会随着样本量的增加和适应度效应而被放大。我们使用英国生物库中的群体遗传模拟和实证分析来验证这些结果。我们的结果在两个情况下尤为重要:大效应罕见变异与特定表型的关联以及从等位基因频率数据中推断负选择。总体而言,我们的研究结果强调了在设计和开展遗传研究时考虑地理广度的重要性,尤其是在生物样本库规模上。
更新日期:2025-06-03
down
wechat
bug