Publications
A collection of my research work.
Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning
Qi Cao†, Shuhao Zhang†, Ruizhe Zhou, Ruiyi Zhang, Peijia Qin, Pengtao Xie
Arxiv Preprint 2026
SCOPE, a model routing framework that predicts how accurate and how expensive each model will be before running it, allowing users to control cost-accuracy trade-offs and naturally handle new models.
A General Image Fusion Approach Exploiting Gradient Transfer Learning and Fusion Rule Unfolding
Wu Wang, Liang-Jian Deng, Qi Cao, Gemine Vivone
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2026
We propose a unified deep image fusion framework that uses sequential gradient-transfer training and fusion-rule unfolding in a deep equilibrium network to efficiently handle multiple fusion tasks and generalize strongly to unseen ones (e.g., medical fusion).
DAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code Generation
Peijia Qin, Ruiyi Zhang, Qi Cao, Pengtao Xie
Arxiv Preprint 2026
We introduce DAJ, the first data-reweighted, reasoning-based LLM judge for Best-of-N test-time scaling, trained with verifiable rewards to address distribution shift and achieve state-of-the-art performance on LiveCodeBench and BigCodeBench.
DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding
Ruiyi Zhang, Peijia Qin, Qi Cao, Pengtao Xie
Arxiv Preprint 2025
Chain-of-Function and domain-reweighted Coding PRM training.
Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
Sai Ashish Somayajula†, Bokai Hu†, Qi Cao, Xin Pan, Pengtao Xie
Findings of the Association for Computational Linguistics: EMNLP 2025
Using PPO to reframe NLU as token-level reinforcement learning with label-based rewards, we substantially boost <14B instruction-tuned LLMs on GLUE/SuperGLUE—beating supervised fine-tuning and even GPT-4o on several sentiment and NLI datasets.
DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
Qi Cao, Ruiyi Wang, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie
The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS) 2025
A multimodal Process Reward Model (PRM) trained with domain-reweighting. Top 1 method on MathVista, MMMU & R-Bench-V.
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
Xiaojun Shan†, Qi Cao†, Xing Han†, Haofei Yu†, Paul Pu Liang
Arxiv Preprint 2025
Adding more multimodal instruction-tuning tasks alone is unreliable—MINT groups tasks by modality interaction (redundancy, selection, fusion) to reduce interference and improve generalization–specialization performance.
Bidomain Modeling Paradigm for Pansharpening
Junming Hou†, Qi Cao†, Ran Ran, Che Liu, Junling Li, Liang-jian Deng
Proceedings of the 31st ACM international conference on multimedia (ACM MM) 2024
We propose BiPan, a bidomain pansharpening framework that models band-specific local spectral features and global spatial details in the Fourier domain, achieving state-of-the-art performance by better handling spectral diversity and MS image degradation.
Zero-shot Semi-supervised Learning for Pansharpening
Qi Cao, Liang-Jian Deng, Wu Wang, Junming Hou, Gemine Vivone
Information Fusion 2024
Zero-shot pansharpening (ZS-Pan) only requires a single pair of PAN/LRMS images. Any pansharpening network can take the ZS-Pan as a plug-and-play module. A two-phase three-component semi-supervised model is designed for ZS-Pan.