Generative Multimodal Models for Social Science: An Application with Satellite and Streetscape Imagery,Sociological Methods & Research

当前位置： X-MOL 学术 › Sociological Methods & Research › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generative Multimodal Models for Social Science: An Application with Satellite and Streetscape Imagery
Sociological Methods & Research ( IF 6.5 ) Pub Date : 2025-05-27 , DOI: 10.1177/00491241251339673
Tina Law, Elizabeth Roberto

Although there is growing social science research examining how generative AI models can be effectively and systematically applied to text-based tasks, whether and how these models can be used to analyze images remain open questions. In this article, we introduce a framework for analyzing images with generative multimodal models, which consists of three core tasks: curation, discovery, and measurement and inference. We demonstrate this framework with an empirical application that uses OpenAI's GPT-4o model to analyze satellite and streetscape images ( n = 1,101) to identify built environment features that contribute to contemporary residential segregation in U.S. cities. We find that when GPT-4o is provided with well-defined image labels, the model labels images with high validity compared to expert labels. We conclude with thoughts for other use cases and discuss how social scientists can work collaboratively to ensure that image analysis with generative multimodal models is rigorous, reproducible, ethical, and sustainable.

中文翻译：

用于社会科学的生成式多模态模型：具有卫星和街景影像的应用程序

尽管越来越多的社会科学研究正在研究如何有效、系统地将生成式 AI 模型应用于基于文本的任务，但这些模型是否以及如何用于分析图像仍是一个悬而未决的问题。在本文中，我们介绍了一个使用生成式多模态模型分析图像的框架，该框架包括三个核心任务：策展、发现以及测量和推理。我们通过一个实证应用程序来演示这个框架，该应用程序使用 OpenAI 的 GPT-4o 模型来分析卫星和街景图像（n = 1,101），以确定导致美国城市当代住宅隔离的建筑环境特征。我们发现，当 GPT-4o 提供定义明确的图像标签时，与专家标签相比，模型标记的图像具有很高的有效性。最后，我们想到了其他用例，并讨论了社会科学家如何协同工作，以确保使用生成式多模态模型进行图像分析是严格的、可重复的、合乎道德的和可持续的。

更新日期：2025-05-27

点击分享查看原文

点击收藏

阅读更多本刊新发论文