Zero‐shot framework for construction equipment task monitoring,Computer-Aided Civil and Infrastructure Engineering

当前位置： X-MOL 学术 › Comput. Aided Civ. Infrastruct. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Zero‐shot framework for construction equipment task monitoring
Computer-Aided Civil and Infrastructure Engineering ( IF 8.5 ) Pub Date : 2025-05-27 , DOI: 10.1111/mice.13506
Jaewon Jeoung, Seunghoon Jung, Taehoon Hong

Vision‐based monitoring of construction equipment is limited in scalability due to the high resource demands of collecting and labeling large datasets across diverse environments. This study proposes a framework that employs Zero‐Shot Learning (ZSL) and Multimodal Large Language Model (MLLM) to recognize construction equipment tasks from video frames without additional training data. The framework operates in two stages: (i) a zero‐shot construction equipment detection stage that includes detection and tracking modules and (ii) an MLLM‐based monitoring stage, utilizing the proprietary model (i.e., GPT‐4o mini) to recognize tasks. Experiments showed that the framework achieved an F1‐score of 82.2% for equipment detection using ZSL. A Multiple Choice Question (MCQ) dataset was constructed for evaluating MLLM, which achieved an accuracy of 79.0%. A practical case study, focusing on excavator tasks, demonstrated accurate recognition of both idle states and complex operations. These results highlight the proposed framework's potential to automate construction equipment monitoring.

中文翻译：

用于建筑设备任务监控的零喷射框架

由于在不同环境中收集和标记大型数据集的资源需求很高，因此基于视觉的建筑设备监控在可扩展性方面受到限制。本研究提出了一个框架，该框架采用零镜头学习（ZSL）和多模态大语言模型（MLLM）从视频帧中识别建筑设备任务，而无需额外的训练数据。该框架分两个阶段运行：（i）包括检测和跟踪模块的零镜头施工设备检测阶段和（ii）基于 MLLM 的监控阶段，利用专有模型（即 GPT-4o mini）来识别任务。实验表明，该框架使用 ZSL 进行设备检测时取得了 82.2% 的 F1 分数。构建多项选择题（MCQ）数据集用于评估 MLLM，准确率达到 79.0%。一个专注于挖掘机任务的实际案例研究展示了对怠速状态和复杂作的准确识别。这些结果突出了所提出的框架在自动化建筑设备监控方面的潜力。

更新日期：2025-05-27

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南