Updating “The Future of Coding”: Qualitative Coding with Generative Large Language Models,Sociological Methods & Research

当前位置： X-MOL 学术 › Sociological Methods & Research › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Updating “The Future of Coding”: Qualitative Coding with Generative Large Language Models
Sociological Methods & Research ( IF 6.5 ) Pub Date : 2025-05-21 , DOI: 10.1177/00491241251339188
Nga Than, Leanne Fan, Tina Law, Laura K. Nelson, Leslie McCall

Over the past decade, social scientists have adapted computational methods for qualitative text analysis, with the hope that they can match the accuracy and reliability of hand coding. The emergence of GPT and open-source generative large language models (LLMs) has transformed this process by shifting from programming to engaging with models using natural language, potentially mimicking the in-depth, inductive, and/or iterative process of qualitative analysis. We test the ability of generative LLMs to replicate and augment traditional qualitative coding, experimenting with multiple prompt structures across four closed- and open-source generative LLMs and proposing a workflow for conducting qualitative coding with generative LLMs. We find that LLMs can perform nearly as well as prior supervised machine learning models in accurately matching hand-coding output. Moreover, using generative LLMs as a natural language interlocutor closely replicates traditional qualitative methods, indicating their potential to transform the qualitative research process, despite ongoing challenges.

中文翻译：

更新“编码的未来”：使用生成式大型语言模型进行定性编码

在过去的十年中，社会科学家已经将计算方法用于定性文本分析，希望它们能够与手动编码的准确性和可靠性相媲美。GPT 和开源生成式大型语言模型（LLM）的出现改变了这一过程，从编程转变为使用自然语言的模型，可能模仿定性分析的深入、归纳和/或迭代过程。我们测试了生成式 LLM 复制和增强传统定性编码的能力，在四个闭源和开源生成式 LLM 中试验了多个提示结构，并提出了一种使用生成式 LLM 进行定性编码的工作流程。我们发现，LLM 在准确匹配手动编码输出方面的性能几乎与以前的监督式机器学习模型一样好。此外，使用生成式 LLM 作为自然语言对话者与传统的定性方法非常相似，这表明尽管存在持续的挑战，但它们仍有可能改变定性研究过程。

更新日期：2025-05-21

点击分享查看原文

点击收藏

阅读更多本刊新发论文