International Journal of Computer Vision期刊新发论文, 计算机, 应用类期刊,

计算机
点击这里，可以切换学科领域

高级搜索

理工

化学/材料
地球科学
电子通信
动力/电气
工程综合
航空航天
化工
环境
建筑
交通运输
计算机
机械工程
矿业
力学
林学
农学
生命科学
数学
水利
天文
统计
土木工程
图书/信管
物理
心理
冶金
医学
自动控制
更多期刊

人文社科

传播/新闻
法学
公共管理
管理学
国际关系
教育
金融/财会
经济学
考古学
科学史
历史
旅游餐饮
区域研究
人口学
人类学
商业
社会工作
社会学
体育
文学
舞蹈
舞台表演
音乐
影视/广播
艺术设计
艺术综合
语言学
哲学
其他社科
更多期刊

热门
人工智能
软件工程
信息系统
系统架构
硬件
- IEEE Internet of Things Journal
- IEEE NETWORK
- IEEE Wireless Communications
应用
综合
其他

当前期刊： International Journal of Computer Vision

Go to current issue 加入关注本刊介绍/投稿指南催更
样式： 排序： 更新日由近到远 Pub Date由近到远 IF: - GO 隐藏已读文章全选导出标记为已读

开启中文模式

我的关注

我的收藏

您暂时未登录！
登录

Optimal Transport with Arbitrary Prior for Dynamic Resolution Network
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-26
Zhizhong Zhang, Shujun Li, Chenyang Zhang, Lizhuang Ma, Xin Tan, Yuan Xie

Dynamic resolution network is proved to be crucial in reducing computational redundancy by automatically assigning satisfactory resolution for each input image. However, it is observed that resolution choices are often collapsed, where prior works tend to assign images to the resolution routes whose computational cost is close to the required FLOPs. In this paper, we propose a novel optimal transport

更新日期：2025-05-26
详情收藏

DocScanner: Robust Document Image Rectification with Progressive Learning
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-26
Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li

Compared with flatbed scanners, portable smartphones provide more convenience for physical document digitization. However, such digitized documents are often distorted due to uncontrolled physical deformations, camera positions, and illumination variations. To this end, we present DocScanner, a novel framework for document image rectification. Different from existing solutions, DocScanner addresses

更新日期：2025-05-26
详情收藏

AutoViT: Achieving Real-Time Vision Transformers on Mobile via Latency-aware Coarse-to-Fine Search
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-26
Zhenglun Kong, Dongkuan Xu, Zhengang Li, Peiyan Dong, Hao Tang, Yanzhi Wang, Subhabrata Mukherjee

Despite their impressive performance on various tasks, vision transformers (ViTs) are heavy for mobile vision applications. Recent works have proposed combining the strengths of ViTs and convolutional neural networks (CNNs) to build lightweight networks. Still, these approaches rely on hand-designed architectures with a pre-determined number of parameters. In this work, we address the challenge of

更新日期：2025-05-26
详情收藏

Lightweight Structure-Aware Attention for Visual Understanding
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-26
Heeseung Kwon, Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Karteek Alahari

Attention operator has been widely used as a basic brick in visual understanding since it provides some flexibility through its adjustable kernels. However, this operator suffers from inherent limitations: (1) the attention kernel is not discriminative enough, resulting in high redundancy, and (2) the complexity in computation and memory is quadratic in the sequence length. In this paper, we propose

更新日期：2025-05-26
详情收藏

《自然》系列期刊2024 WoS高引热门文章速读

PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-25
Peiyuan Zhang, Junwei Luo, Xue Yang, Yi Yu, Qingyun Li, Yue Zhou, Xiaosong Jia, Xudong Lu, Jingdong Chen, Xiang Li, Junchi Yan, Yansheng Li

With the growing demand for oriented object detection (OOD), recent studies on point-supervised OOD have attracted significant interest. In this paper, we propose PointOBB-v3, a stronger single point-supervised OOD framework. Compared to existing methods, it generates pseudo rotated boxes without additional priors and incorporates support for the end-to-end paradigm. PointOBB-v3 functions by integrating

更新日期：2025-05-25
详情收藏

Modeling Scattering Effect for Under-Display Camera Image Restoration
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-25
Binbin Song, Jiantao Zhou, Xiangyu Chen, Shuning Xu

The under-display camera (UDC) technology furnishes users with an uninterrupted full-screen viewing experience, eliminating the need for notches or punch holes. However, the translucent properties of the display lead to substantial degradation in UDC images. This work addresses the challenge of restoring UDC images by specifically targeting the scattering effect induced by the display. We explicitly

更新日期：2025-05-25
详情收藏

Supplementary Prompt Learning for Vision-Language Models
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-24
Rongfei Zeng, Zhipeng Yang, Ruiyun Yu, Yonggang Zhang

Pre-trained vision-language models like CLIP have shown remarkable capabilities across various downstream tasks with well-tuned prompts. Advanced methods tune prompts by optimizing context while keeping the class name fixed, implicitly assuming that the class names in prompts are accurate and not missing. However, this assumption may be violated in numerous real-world scenarios, leading to potential

更新日期：2025-05-24
详情收藏

Local Concept Embeddings for Analysis of Concept Distributions in Vision DNN Feature Spaces
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-24
Georgii Mikriukov, Gesina Schwalbe, Korinna Bade

Insights into the learned latent representations are imperative for verifying deep neural networks (DNNs) in critical computer vision (CV) tasks. Therefore, state-of-the-art supervised Concept-based eXplainable Artificial Intelligence (C-XAI) methods associate user-defined concepts like “car” each with a single vector in the DNN latent space (concept embedding vector). In the case of concept segmentation

更新日期：2025-05-24
详情收藏

2024年度“中国科学十大进展”：八篇Nature彰显国际影响力！

MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-24
Jialv Zou, Bencheng Liao, Qian Zhang, Wenyu Liu, Xinggang Wang

Learning robust and scalable visual representations from massive multi-view video data remains a challenge in computer vision and autonomous driving. Existing pre-training methods either rely on expensive supervised learning with 3D annotations, limiting the scalability, or focus on single-frame or monocular inputs, neglecting the temporal information, which is fundamental for the ultimate application

更新日期：2025-05-24
详情收藏

RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-23
Zhiqiang Yan, Xiang Li, Le Hui, Zhenyu Zhang, Jun Li, Jian Yang

Depth completion aims to recover dense depth maps from sparse ones, where color images are often used to facilitate this task. Recent depth methods primarily focus on image guided learning frameworks. However, blurry guidance in the image and unclear structure in the depth still impede their performance. To tackle these challenges, we explore a repetitive design in our image guided network to gradually

更新日期：2025-05-23
详情收藏

A Generalized Contour Vibration Model for Building Extraction
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-22
Chunyan Xu, Shuaizhen Yao, Ziqiang Xu, Zhen Cui, Jian Yang

Classic active contour models (ACMs) are becoming a great promising solution to the contour-based object extraction with the progress of deep learning recently. Inspired by the wave vibration theory in physics, we propose a Generalized Contour Vibration Model (G-CVM) by inheriting the force and motion principle of contour wave for automatically estimating building contours. The contour estimation problems

更新日期：2025-05-22
详情收藏

Simplified Concrete Dropout - Improving the Generation of Attribution Masks for Fine-grained Classification
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-22
Dimitri Korsch, Maha Shadaydeh, Joachim Denzler

In fine-grained classification, which is classifying images into subcategories within a common broader category, it is crucial to have precise visual explanations of the classification model’s decision. While commonly used attention- or gradient-based methods deliver either too coarse or too noisy explanations unsuitable for highlighting subtle visual differences reliably, perturbation-based methods

更新日期：2025-05-22
详情收藏

Frontiers 近期专刊征稿汇总 | Frontiers in Chemistry

Spatial-Temporal Transformer for Single RGB-D Camera Synchronous Tracking and Reconstruction of Non-rigid Dynamic Objects
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-21
Xiaofei Liu, Zhengkun Yi, Xinyu Wu, Wanfeng Shang

We propose a simple and effective method that views the problem of single RGB-D camera synchronous tracking and reconstruction of non-rigid dynamic objects as an aligned sequential point cloud prediction problem. Our method does not require additional data transformations (truncated signed distance function or deformation graphs, etc.), alignment constraints (handcrafted features or optical flow, etc

更新日期：2025-05-21
详情收藏

Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-19
De Cheng, Lingfeng He, Nannan Wang, Dingwen Zhang, Xinbo Gao

Unsupervised visible-infrared person re-identification (USL-VI-ReID) seeks to match pedestrian images of the same individual across different modalities without human annotations for model learning. Previous methods unify pseudo-labels of cross-modality images through label association algorithms and then design contrastive learning framework for global feature learning. However, these methods overlook

更新日期：2025-05-20
详情收藏

Generalized Closed-Form Formulae for Feature-Based Subpixel Alignment in Patch-Based Matching
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-19
Laurent Valentin Jospin, Hamid Laga, Farid Boussaid, Mohammed Bennamoun

Patch-based matching is a technique meant to measure the disparity between pixels in a source and target image and is at the core of various methods in computer vision. When the subpixel disparity between the source and target images is required, the cost function or the target image has to be interpolated. While cost-based interpolation is easier to implement, multiple works have shown that image-based

更新日期：2025-05-19
详情收藏

Learning to Deblur Polarized Images
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-19
Chu Zhou, Minggui Teng, Xinyu Zhou, Chao Xu, Imari Sato, Boxin Shi

A polarization camera can capture four linear polarized images with different polarizer angles in a single shot, which is useful in polarization-based vision applications since the degree of linear polarization (DoLP) and the angle of linear polarization (AoLP) can be directly computed from the captured polarized images. However, since the on-chip micro-polarizers block part of the light so that the

更新日期：2025-05-19
详情收藏

《自然-医学》 30周年：未来医学的挑战与机遇

SimZSL: Zero-Shot Learning Beyond a Pre-defined Semantic Embedding Space
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-17
Mina Ghadimi Atigh, Stephanie Nargang, Martin Keller-Ressel, Pascal Mettes

Zero-shot recognition is centered around learning representations to transfer knowledge from seen to unseen classes. Where foundational approaches perform the transfer with semantic embedding spaces, e.g., from attributes or word vectors, the current state-of-the-art relies on prompting pre-trained vision-language models to obtain class embeddings. Whether zero-shot learning is performed with attributes

更新日期：2025-05-18
详情收藏

High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-16
Libo Zhang, Yongsheng Yu, Jiali Yao, Heng Fan

Generative Adversarial Network (GAN) inversion have demonstrated excellent performance in image inpainting that aims to restore lost or damaged image texture using its unmasked content. Previous GAN inversion-based methods usually utilize well-trained GAN models as effective priors to generate the realistic regions for missing holes. Despite excellence, they ignore a hard constraint that the unmasked

更新日期：2025-05-17
详情收藏

HumanLiff: Layer-wise 3D Human Diffusion Model
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-16
Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, Ziwei Liu

3D human generation from 2D images has achieved remarkable progress through the synergistic utilization of neural rendering and generative models. Existing 3D human generative models mainly generate a clothed 3D human as an inseparable 3D model in a single pass, while rarely considering the layer-wise nature of a clothed human body, which often consists of the human body and various clothes such as

更新日期：2025-05-16
详情收藏

Defending Against Adversarial Examples Via Modeling Adversarial Noise
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-14
Dawei Zhou, Nannan Wang, Bo Han, Tongliang Liu, Xinbo Gao

Adversarial examples have become a major threat to the reliable application of deep learning models. Meanwhile, this issue promotes the development of adversarial defenses. Adversarial noise contains well-generalizing and misleading features, which can manipulate predicted labels to be flipped maliciously. Motivated by this, we study modeling adversarial noise for defending against adversarial examples

更新日期：2025-05-15
详情收藏

专题征稿:Hybrid Nanomaterials for Next-Generation Energy Storage 期刊:Materials for Renewable and Sustainable Energy

IPAD: Iterative, Parallel, and Diffusion-Based Network for Scene Text Recognition
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-14
Xiaomeng Yang, Zhi Qiao, Yu Zhou

Nowadays, scene text recognition has attracted more and more attention due to its diverse applications. Most state-of-the-art methods adopt an encoder-decoder framework with the attention mechanism, autoregressively generating text from left to right. Despite the convincing performance, this sequential decoding strategy constrains the inference speed. Conversely, non-autoregressive models provide faster

更新日期：2025-05-15
详情收藏

An Information Theory-Inspired Strategy for Automated Network Pruning
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-12
Xiawu Zheng, Yuexiao Ma, Teng Xi, Gang Zhang, Errui Ding, Yuchao Li, Jie Chen, Yonghong Tian, Rongrong Ji

Despite superior performance achieved on many computer vision tasks, deep neural networks demand high computing power and memory footprint. Most existing network pruning methods require laborious human efforts and prohibitive computation resources, especially when the constraints are changed. This practically limits the application of model compression when the model needs to be deployed on a wide

更新日期：2025-05-13
详情收藏

Exploring Bidirectional Bounds for Minimax-Training of Energy-Based Models
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-13
Cong Geng, Jia Wang, Li Chen, Zhiyong Gao, Jes Frellsen, Søren Hauberg

Energy-based models (EBMs) estimate unnormalized densities in an elegant framework, but they are generally difficult to train. Recent work has linked EBMs to generative adversarial networks, by noting that they can be trained through a minimax game using a variational lower bound. To avoid the instabilities caused by minimizing a lower bound, we propose to instead work with bidirectional bounds, meaning

更新日期：2025-05-13
详情收藏

Bamboo: Building Mega-Scale Vision Dataset Continually with Human–Machine Synergy
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-13
Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu

Large-scale datasets play a vital role in computer vision. But current datasets are annotated blindly without differentiation to samples, making the data collection inefficient and unscalable. The open question is how to build a mega-scale dataset actively. Although advanced active learning algorithms might be the answer, we experimentally found that they are lame in the realistic annotation scenario

更新日期：2025-05-13
详情收藏

专题征稿:Materials Science: Quantum Material (2D and Hybrids): Synthesis, Characterization, Simulation, Properties, and Applications 期刊:Discover Applied Sciences

A Norm Regularization Training Strategy for Robust Image Quality Assessment Models
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-12
Yujia Liu, Chenxi Yang, Dingquan Li, Tingting Jiang, Tiejun Huang

Image Quality Assessment (IQA) models predict the quality score of input images. They can be categorized into Full-Reference (FR-) and No-Reference (NR-) IQA models based on the availability of reference images. These models are essential for performance evaluation and optimization guidance in the media industry. However, researchers have observed that introducing imperceptible perturbations to input

更新日期：2025-05-12
详情收藏

CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-09
Jinheng Xie, Songhe Deng, Xianxu Hou, Zhaochuan Luo, Linlin Shen, Yawen Huang, Yefeng Zheng, Mike Zheng Shou

While promising results have been achieved in weakly-supervised semantic segmentation (WSSS), limited supervision from image-level tags inevitably induces discriminative reliance and spurious relations between target classes and background regions. Thus, Class Activation Map (CAM) usually tends to activate discriminative object regions and falsely includes lots of class-related backgrounds. Without

更新日期：2025-05-10
详情收藏

Autoregressive Temporal Modeling for Advanced Tracking-by-Diffusion
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-09
Pha Nguyen, Rishi Madhok, Bhiksha Raj, Khoa Luu

Object tracking is a widely studied computer vision task with video and instance analysis applications. While paradigms such as tracking-by-regression,-detection,-attention have advanced the field, generative modeling offers new potential. Although some studies explore the generative process in instance-based understanding tasks, they rely on prediction refinement in the coordinate space rather than

更新日期：2025-05-09
详情收藏

HiLM-D: Enhancing MLLMs with Multi-scale High-Resolution Details for Autonomous Driving
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-07
Xinpeng Ding, Jianhua Han, Hang Xu, Wei Zhang, Xiaomeng Li

Recent efforts to use natural language for interpretable driving focus mainly on planning, neglecting perception tasks. In this paper, we address this gap by introducing ROLISP (Risk Object Localization and Intention and Suggestion Prediction), which towards interpretable risk object detection and suggestion for ego car motions. Accurate ROLISP implementation requires extensive reasoning to identify

更新日期：2025-05-08
详情收藏

专题征稿:Mine Water Biogeochemistry: Processes, Management, and Treatment 期刊:Geochemical Transactions

BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-06
Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Mingli Zhu, Ruotong Wang, Li Liu, Chao Shen

In recent years, backdoor learning has attracted increasing attention due to its effectiveness on investigating the adversarial vulnerability of artificial intelligence (AI) systems. Several seminal backdoor attack and defense algorithms have been developed, forming an increasingly fierce arms race. However, since backdoor learning involves various factors in different stages of an AI system (e.g.

更新日期：2025-05-06
详情收藏

Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-05
Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang

Text-to-image models have recently experienced rapid development, achieving astonishing performance in terms of fidelity and textual alignment capabilities. However, given a long paragraph (up to 512 words), these generation models still struggle to achieve strong alignment and are unable to generate images depicting complex scenes. In this paper, we introduce an information-enriched diffusion model

更新日期：2025-05-05
详情收藏

P2Object: Single Point Supervised Object Detection and Instance Segmentation
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-03
Pengfei Chen, Xuehui Yu, Xumeng Han, Kuiran Wang, Guorong Li, Lingxi Xie, Zhenjun Han, Jianbin Jiao

Object recognition using single-point supervision has attracted increasing attention recently. However, the performance gap compared with fully-supervised algorithms remains large. Previous works generated class-agnostic proposals in an image offline and then treated mixed candidates as a single bag, putting a huge burden on multiple instance learning (MIL). In this paper, we introduce Point-to-Box

更新日期：2025-05-03
详情收藏

Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-03
Dhruv Verma, Debaditya Roy, Basura Fernando

Situation recognition refers to the ability of an agent to identify and understand various situations or contexts based on available information and sensory inputs. It involves the cognitive process of interpreting data from the environment to determine what is happening, what factors are involved, and what actions caused those situations. This interpretation of situations is formulated as a semantic

更新日期：2025-05-03
详情收藏

ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-05-03
Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Yonggang Wen

Recent advancements in multimodal fusion have witnessed the remarkable success of vision-language (VL) models, which excel in various multimodal applications such as image captioning and visual question answering. However, building VL models requires substantial hardware resources, where efficiency is restricted by two key factors: the extended input sequence of the language model with vision features

更新日期：2025-05-03
详情收藏

Few-Shot Referring Video Single- and Multi-Object Segmentation Via Cross-Modal Affinity with Instance Sequence Matching
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-28
Heng Liu, Guanghui Li, Mingqi Gao, Xiantong Zhen, Feng Zheng, Yang Wang

Referring Video Object Segmentation (RVOS) aims to segment specific objects in videos based on the provided natural language descriptions. As a new supervised visual learning task, achieving RVOS for a given scene requires a substantial amount of annotated data. However, only minimal annotations are usually available for new scenes in realistic scenarios. Another practical problem is that, apart from

更新日期：2025-04-29
详情收藏

Interaction Confidence Attention for Human–Object Interaction Detection
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-28
Hong-Bo Zhang, Wang-Kai Lin, Hang Su, Qing Lei, Jing-Hua Liu, Ji-Xiang Du

In human–object interaction (HOI) detection task, ensuring that interactive pairs receive higher attention weights while reducing the weight of non-interaction pairs is imperative for enhancing HOI detection accuracy. Guiding attention learning is also a key aspect of existing transformer-based algorithms. To tackle this challenge, this study proposes a novel approach termed Interaction Confidence

更新日期：2025-04-28
详情收藏

A Closer Look at Benchmarking Self-supervised Pre-training with Image Classification
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-27
Markus Marks, Manuel Knott, Neehar Kondapaneni, Elijah Cole, Thijs Defraeye, Fernando Perez-Cruz, Pietro Perona

Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. The model is forced to learn about the data’s inherent structure or context by solving a pretext task. With SSL, models can learn from abundant and cheap unlabeled data, significantly reducing the cost of training models where labels are expensive or inaccessible

更新日期：2025-04-27
详情收藏

Data-Adaptive Weight-Ensembling for Multi-task Model Fusion
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-25
Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du, Dacheng Tao

Creating a multi-task model by merging models for distinct tasks has proven to be an economical and scalable approach. Recent research, like task arithmetic, demonstrates that a static solution for multi-task model fusion can be located within the vector space spanned by task vectors. However, the static nature of these methods limits their ability to adapt to the intricacies of individual instances

更新日期：2025-04-25
详情收藏

P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-21
Jiahao Nie, Fei Xie, Sifan Zhou, Xueyi Zhou, Dong-Kyu Chae, Zhiwei He

3D single object tracking (SOT) methods based on appearance matching has long suffered from insufficient appearance information incurred by incomplete, textureless and semantically deficient LiDAR point clouds. While motion paradigm exploits motion cues instead of appearance matching for tracking, it incurs complex multi-stage processing and segmentation module. In this paper, we first provide in-depth

更新日期：2025-04-21
详情收藏

SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-15
Mingxin Huang, Dezhi Peng, Hongliang Li, Zhenghao Peng, Chongyu Liu, Dahua Lin, Yuliang Liu, Xiang Bai, Lianwen Jin

End-to-end scene text spotting, which aims to read the text in natural images, has garnered significant attention in recent years. However, recent state-of-the-art methods usually incorporate detection and recognition simply by sharing the backbone, which does not directly take advantage of the feature interaction between the two tasks. In this paper, we propose a new end-to-end scene text spotting

更新日期：2025-04-16
详情收藏

D3T: Dual-Domain Diffusion Transformer in Triplanar Latent Space for 3D Incomplete-View CT Reconstruction
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-16
Xuhui Liu, Hong Li, Zhi Qiao, Yawen Huang, Xi Liu, Juan Zhang, Zhen Qian, Xiantong Zhen, Baochang Zhang

Computed tomography (CT) is a cornerstone of clinical imaging, yet its accessibility in certain scenarios is constrained by radiation exposure concerns and operational limitations within surgical environments. CT reconstruction from incomplete views has attracted increasing research attention due to its great potential in medical applications. However, it is inherently an ill-posed problem, which,

更新日期：2025-04-16
详情收藏

C2RF: Bridging Multi-modal Image Registration and Fusion via Commonality Mining and Contrastive Learning
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-15
Linfeng Tang, Qinglong Yan, Xinyu Xiang, Leyuan Fang, Jiayi Ma

Existing image fusion methods are typically only applicable to strictly aligned source images, and they introduce undesirable artifacts when source images are misaligned, compromising visual perception and downstream applications. In this work, we propose a mutually promoting multi-modal image registration and fusion framework based on commonality mining and contrastive learning, named C2RF. We adaptively

更新日期：2025-04-15
详情收藏

Segment Anything in 3D with Radiance Fields
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-09
Jiazhong Cen, Jiemin Fang, Zanwei Zhou, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

The Segment Anything Model (SAM) emerges as a powerful vision foundation model to generate high-quality 2D segmentation results. This paper aims to generalize SAM to segment 3D objects. Rather than replicating the data acquisition and annotation procedure which is costly in 3D, we design an efficient solution, leveraging the radiance field as a cheap and off-the-shelf prior that connects multi-view

更新日期：2025-04-10
详情收藏

A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-10
Hao Ai, Zidong Cao, Lin Wang

Omnidirectional image (ODI) data is captured with a field-of-view of $360^\circ \times 180^\circ $, which is much wider than the pinhole cameras and captures richer surrounding environment details than the conventional perspective images. In recent years, the availability of customer-level $360^\circ $ cameras has made omnidirectional vision more popular, and the advance of deep learning (DL) has

更新日期：2025-04-10
详情收藏

AvatarStudio: High-Fidelity and Animatable 3D Avatar Creation from Text
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-07
Xuanmeng Zhang, Jianfeng Zhang, Chenxu Zhang, Jun Hao Liew, Huichao Zhang, Yi Yang, Jiashi Feng

We study the problem of creating high-fidelity and animatable 3D avatars from only textual descriptions. Existing text-to-avatar methods are either limited to static avatars which cannot be animated or struggle to generate animatable avatars with promising quality and precise pose control. To address these limitations, we propose AvatarStudio, a generative model that yields explicit textured 3D meshes

更新日期：2025-04-07
详情收藏

Diffusion-Enhanced Test-Time Adaptation with Text and Image Augmentation
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-05
Chun-Mei Feng, Yuanyang He, Jian Zou, Salman Khan, Huan Xiong, Zhen Li, Wangmeng Zuo, Rick Siow Mong Goh, Yong Liu

Existing test-time prompt tuning (TPT) methods focus on single-modality data, primarily enhancing images and using confidence ratings to filter out inaccurate images. However, while image generation models can produce visually diverse images, single-modality data enhancement techniques still fail to capture the comprehensive knowledge provided by different modalities. Additionally, we note that the

更新日期：2025-04-06
详情收藏

NU-AIR: A Neuromorphic Urban Aerial Dataset for Detection and Localization of Pedestrians and Vehicles
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-03
Craig Iaboni, Thomas Kelly, Pramod Abichandani

This paper presents an open-source aerial neuromorphic dataset that captures pedestrians and vehicles moving in an urban environment. The dataset, titled NU-AIR, features over 70 min of event footage acquired with a 640 $\times $ 480 resolution neuromorphic sensor mounted on a quadrotor operating in an urban environment. Crowds of pedestrians, different types of vehicles, and street scenes featuring

更新日期：2025-04-03
详情收藏

Free Lunch to Meet the Gap: Intermediate Domain Reconstruction for Cross-Domain Few-Shot Learning
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-01
Tong Zhang, Yifan Zhao, Liangyu Wang, Jia Li

Cross-domain few-shot learning (CDFSL) endeavors to transfer generalized knowledge from the source domain to target domains using only a minimal amount of training data, which faces a triplet of learning challenges in the meantime, i.e., semantic disjoint, large domain discrepancy, and data scarcity. Different from predominant CDFSL works focused on generalized representations, we make novel attempts

更新日期：2025-04-02
详情收藏

A Fast and Lightweight 3D Keypoint Detector
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-04-01
Chengzhuan Yang, Qian Yu, Hui Wei, Fei Wu, Yunliang Jiang, Zhonglong Zheng, Ming-Hsuan Yang

Keypoint detection is crucial in many visual tasks, such as object recognition, shape retrieval, and 3D reconstruction, as labeling point data is labor-intensive or sometimes implausible. Nevertheless, it is challenging to quickly and accurately locate keypoints unsupervised from point clouds. This work proposes a fast and lightweight 3D keypoint detector that can efficiently and accurately detect

更新日期：2025-04-01
详情收藏

Creatively Upscaling Images with Global-Regional Priors
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-31
Yurui Qian, Qi Cai, Yingwei Pan, Ting Yao, Tao Mei

Contemporary diffusion models show remarkable capability in text-to-image generation, while still being limited to restricted resolutions (e.g., $1024\times 1024$). Recent advances enable tuning-free higher-resolution image generation by recycling pre-trained diffusion models and extending them via regional denoising or dilated sampling/convolutions. However, these models struggle to simultaneously

更新日期：2025-03-31
详情收藏

$$\hbox {I}^2$$ MD: 3D Action Representation Learning with Inter- and Intra-Modal Mutual Distillation
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-27
Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li

Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited

更新日期：2025-03-28
详情收藏

Advances in 3D Neural Stylization: A Survey
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-28
Yingshu Chen, Guocheng Shao, Ka Chun Shum, Binh-Son Hua, Sai-Kit Yeung

Modern artificial intelligence offers a novel and transformative approach to creating digital art across diverse styles and modalities like images, videos and 3D data, unleashing the power of creativity and revolutionizing the way that we perceive and interact with visual content. This paper reports on recent advances in stylized 3D asset creation and manipulation with the expressive power of neural

更新日期：2025-03-28
详情收藏

Pre-training for Action Recognition with Automatically Generated Fractal Datasets
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-26
Davyd Svyezhentsev, George Retsinas, Petros Maragos

In recent years, interest in synthetic data has grown, particularly in the context of pre-training the image modality to support a range of computer vision tasks, including object classification, medical imaging etc. Previous work has demonstrated that synthetic samples, automatically produced by various generative processes, can replace real counterparts and yield strong visual representations. This

更新日期：2025-03-26
详情收藏

ScenarioDiff: Text-to-video Generation with Dynamic Transformations of Scene Conditions
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-25
Yipeng Zhang, Xin Wang, Hong Chen, Chenyang Qin, Yibo Hao, Hong Mei, Wenwu Zhu

With the development of diffusion models, text-to-video generation has recently received significant attention and achieved remarkable success. However, existing text-to-video approaches suffer from the following weaknesses: i) they fail to control the trajectory of the subject as well as the process of scene transformations; ii) they can only generate videos with limited frames, failing to capture

更新日期：2025-03-25
详情收藏

LaneCorrect: Self-Supervised Lane Detection
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-24
Ming Nie, Xinyue Cai, Hang Xu, Li Zhang

Lane detection has evolved highly functional autonomous driving system to understand driving scenes even under complex environments. In this paper, we work towards developing a generalized computer vision system able to detect lanes without using any annotation. We make the following contributions: (i) We illustrate how to perform unsupervised 3D lane segmentation by leveraging the distinctive intensity

更新日期：2025-03-24
详情收藏

Camouflaged Object Detection with Adaptive Partition and Background Retrieval
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-22
Bowen Yin, Xuying Zhang, Li Liu, Ming-Ming Cheng, Yongxiang Liu, Qibin Hou

Recent works confirm the importance of local details for identifying camouflaged objects. However, how to identify the details around the target objects via background cues lacks in-depth study. In this paper, we take this into account and present a novel learning framework for camouflaged object detection, called AdaptCOD. To be specific, our method decouples the detection process into three parts

更新日期：2025-03-23
详情收藏

Preconditioned Score-Based Generative Models
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-21
Hengyuan Ma, Xiatian Zhu, Jianfeng Feng, Li Zhang

Score-based generative models (SGMs) have recently emerged as a promising class of generative models. However, a fundamental limitation is that their sampling process is slow due to a need for many (e.g., 2000) iterations of sequential computations. An intuitive acceleration method is to reduce the sampling iterations which however causes severe performance degradation. We assault this problem to the

更新日期：2025-03-22
详情收藏

FlowSDF: Flow Matching for Medical Image Segmentation Using Distance Transforms
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-22
Lea Bogensperger, Dominik Narnhofer, Alexander Falk, Konrad Schindler, Thomas Pock

Medical image segmentation plays an important role in accurately identifying and isolating regions of interest within medical images. Generative approaches are particularly effective in modeling the statistical properties of segmentation masks that are closely related to the respective structures. In this work we introduce FlowSDF, an image-guided conditional flow matching framework, designed to represent

更新日期：2025-03-22
详情收藏

CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-20
Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao, Jieping Ye

The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two frameworks

更新日期：2025-03-21
详情收藏

LR-ASD: Lightweight and Robust Network for Active Speaker Detection
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-19
Junhua Liao, Haihan Duan, Kanghui Feng, Wanbing Zhao, Yanbing Yang, Liangyin Chen, Yanru Chen

Active speaker detection is a challenging task aimed at identifying who is speaking. Due to the critical importance of this task in numerous applications, it has received considerable attention. Existing studies endeavor to enhance performance at any cost by inputting information from multiple candidates and designing complex models. While these methods have achieved excellent performance, their substantial

更新日期：2025-03-20
详情收藏

A Solution to Co-occurrence Bias in Pedestrian Attribute Recognition: Theory, Algorithms, and Improvements
Int. J. Comput. Vis. (IF 11.6) Pub Date : 2025-03-18
Yibo Zhou, Hai-Miao Hu, Jinzuo Yu, Haotian Wu, Shiliang Pu, Hanzi Wang

For the pedestrian attributes recognition, we demonstrate that deep models can memorize the pattern of attributes co-occurrences inherent to dataset, whether through explicit or implicit means. However, since the attributes interdependency is highly variable and unpredictable across different scenarios, the modeled attributes co-occurrences de facto serve as a data selection bias that hardly generalizes

更新日期：2025-03-19
详情收藏

Contents have been reproduced by permission of the publishers.

全选导出标记为已读

全部期刊列表>>