Publications

A collection of my research work.

Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning

Qi Cao, Shuhao Zhang, Ruizhe Zhou, Ruiyi Zhang, Peijia Qin, Pengtao Xie

Arxiv Preprint 2026

SCOPE, a model routing framework that predicts how accurate and how expensive each model will be before running it, allowing users to control cost-accuracy trade-offs and naturally handle new models.

DOICodeProject

A General Image Fusion Approach Exploiting Gradient Transfer Learning and Fusion Rule Unfolding

Wu Wang, Liang-Jian Deng, Qi Cao, Gemine Vivone

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2026

We propose a unified deep image fusion framework that uses sequential gradient-transfer training and fusion-rule unfolding in a deep equilibrium network to efficiently handle multiple fusion tasks and generalize strongly to unseen ones (e.g., medical fusion).

DOI

DAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code Generation

Peijia Qin, Ruiyi Zhang, Qi Cao, Pengtao Xie

Arxiv Preprint 2026

We introduce DAJ, the first data-reweighted, reasoning-based LLM judge for Best-of-N test-time scaling, trained with verifiable rewards to address distribution shift and achieve state-of-the-art performance on LiveCodeBench and BigCodeBench.

DOI

DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding

Ruiyi Zhang, Peijia Qin, Qi Cao, Pengtao Xie

Arxiv Preprint 2025

Chain-of-Function and domain-reweighted Coding PRM training.

DOI

DreamPRM-1.5: Unlocking the Potential of Each Instance for Multimodal Process Reward Model Training

Qi Cao, Pengtao Xie

Arxiv Preprint 2025

An instance-reweighting updated version of DreamPRM, higher accuracy and more robust.

DOICode

Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning

Sai Ashish Somayajula, Bokai Hu, Qi Cao, Xin Pan, Pengtao Xie

Findings of the Association for Computational Linguistics: EMNLP 2025

Using PPO to reframe NLU as token-level reinforcement learning with label-based rewards, we substantially boost <14B instruction-tuned LLMs on GLUE/SuperGLUE—beating supervised fine-tuning and even GPT-4o on several sentiment and NLI datasets.

DOICode

DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

Qi Cao, Ruiyi Wang, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie

The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS) 2025

Spotlight @ Multimodal Algorithmic Reasoning Workshop

A multimodal Process Reward Model (PRM) trained with domain-reweighting. Top 1 method on MathVista, MMMU & R-Bench-V.

DOICode

MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping

Xiaojun Shan, Qi Cao, Xing Han, Haofei Yu, Paul Pu Liang

Arxiv Preprint 2025

Adding more multimodal instruction-tuning tasks alone is unreliable—MINT groups tasks by modality interaction (redundancy, selection, fusion) to reduce interference and improve generalization–specialization performance.

DOI

Bidomain Modeling Paradigm for Pansharpening

Junming Hou, Qi Cao, Ran Ran, Che Liu, Junling Li, Liang-jian Deng

Proceedings of the 31st ACM international conference on multimedia (ACM MM) 2024

Oral

We propose BiPan, a bidomain pansharpening framework that models band-specific local spectral features and global spatial details in the Fourier domain, achieving state-of-the-art performance by better handling spectral diversity and MS image degradation.

DOICode

Zero-shot Semi-supervised Learning for Pansharpening

Qi Cao, Liang-Jian Deng, Wu Wang, Junming Hou, Gemine Vivone

Information Fusion 2024

Zero-shot pansharpening (ZS-Pan) only requires a single pair of PAN/LRMS images. Any pansharpening network can take the ZS-Pan as a plug-and-play module. A two-phase three-component semi-supervised model is designed for ZS-Pan.

DOICode