HD-Vision@MMAsia2025

Summary

The International Workshop on Imaging, Processing, Perception, and Reasoning for High-Dimensional Visual Data (HD-Vision) addresses the rapidly evolving frontier of multimedia research, where visual information extends far beyond conventional 2D imagery. Emerging modalities such as light fields, event-based data, hyperspectral imaging, and multimodal sensor fusion encode rich spatial, temporal, angular, spectral, and cross-modal cues, unlocking unprecedented opportunities for comprehensive scene understanding.

These modalities also introduce fundamental challenges in sensing, representation, and interpretation: the demand for novel acquisition techniques, efficient compression and transmission, robust neural reconstruction, and semantics-aware reasoning. HD-Vision aims to unite researchers from computational imaging, multimodal learning, and neural representation fields to confront these challenges and bridge the gap between foundational theory and practical deployment.

The workshop serves as a collaborative platform to present state-of-the-art advances and visionary perspectives, fostering interdisciplinary solutions applicable to domains such as autonomous systems, AR/VR, intelligent robotics, medical diagnostics, and remote sensing. By integrating diverse expertise, HD-Vision seeks to catalyze the next generation of high-dimensional multimedia understanding.

Call for Papers

We invite original submissions that address challenges and advances across the full spectrum of high-dimensional multimedia understanding. Topics of interest include, but are not limited to:

High-Dimensional Visual Sensing and Computational Imaging
Techniques for capturing complex modalities such as light fields, event-based data, hyperspectral imaging, and multimodal sensor systems.
Compression and Neural Representations for Complex Modalities
Efficient encoding, representation, and transmission methods, including learning-based codecs, neural implicit representations, NeRFs, and Gaussian splatting.
Semantic Understanding and Cross-Modal Perception
High-level interpretation of spatial, temporal, and angular information, including multi-modal data registration, fusion, and semantic parsing.
Vision-Language Reasoning for Multi-modal Data
Foundation models and LLMs for spatially grounded multimodal reasoning, including pretraining strategies, parameter-efficient adaptation, and cross-modal alignment.
Datasets and Benchmarks for High-Dimensional Media
Construction, annotation, and evaluation of datasets spanning novel sensing modalities and complex data distributions.
Trustworthy and Efficient Multi-modal Intelligence
Energy-aware, robust, and privacy-preserving systems for high-dimensional data processing, including ethical concerns and deployment efficiency on edge devices.

Submission Website: Submit via CMT

Download CFP (PDF): Click here to download

Important Registration Note: All accepted papers need to be covered by a full registration. Click here to register.

Keynote Speaker

Prof. Yi-Ping Phoebe Chen

(La Trobe University, Australia)

Speakers

Dr. Zhuoyuan Li

(USTC, China)

Dr. Hanyu Zhou

(NUS, Singapore)

Dr. Zixiang Zhao

(ETH Zürich, Switzerland)

Schedule

Note: The schedule is for reference only and is subject to change.

Join us via: https://nus-sg.zoom.us/j/4749365770?pwd=H5Fk0ZPO0rjxdewISh46d3lrP0v8lb.1

Time	Event
14:00 - 14:05	Welcome & Session Introduction
14:05 - 14:45	Oral Session (4 contributed talks, 8+2 min each)
14:45 - 15:25	Keynote: Prof. Yi-Ping Phoebe Chen (40 min)
15:30 - 16:00	Coffee Break
16:00 - 16:20	Invited Speaker Talk #1: Dr. Zhuoyuan Li (20 min)
16:20 - 16:40	Invited Speaker Talk #2: Dr. Hanyu Zhou (20 min)
16:40 - 17:00	Invited Speaker Talk #3: Dr. Zixiang Zhao (20 min)
17:00 - 17:10	Awards & Closing (Best Paper, etc.)

Accepted Papers

Title	Type
Exploiting Appearance Re-Emergence for Robust Visual Tracking	Oral
Spike Camera Image Reconstruction Based on an Efficient Spiking Transformer	Oral
PanoExtend: An Omnidirectional Image Super-Resolution Method Based on Spherical Expansion	Oral
Revisiting Intelligent Settlement and Nutritional Estimation of Small-bowl Dishes via Deep Learning	Oral
Seeing in the Noisy Dark: A New Real-world Benchmark and an Efficient Method for Extreme Low-light Image Enhancement	Poster
Point Long-Term Locality-Aware Transformer for Point Cloud Video Understanding	Poster
Multi-scale Dynamic Network for Document Shadow Removal	Poster
Memory-Augmented Continuous-Time Neural Policy for Vision-Guided Embodied Navigation	Poster
PanoExtend: An Omnidirectional Image Super-Resolution Method Based on Spherical Expansion	Poster
A Survey on Future Physical World Generation for Autonomous Driving	Poster
A Survey for Point Prompt of Segment Anything Model	Poster
Triple-Branch Fusion Module with Spatial-Frequency Cross-Attention Mechanism for Small Object Detection	Poster

Organizers

Zeyu Xiao

NUS, Singapore

Zhuoyuan Li

USTC, China

Xiang Chen

NJUST, China & EntroVision

Cong Zhang

CUHK, HKSAR

Hadi Amirpour

University of Klagenfurt, Austria

Yakun Ju

University of Leicester, UK

Zhiwei Xiong