Qi Cao

Qi Cao

PhD Student

University of California, San Diego

Research Interests

LLM and LLM agent
Machine Learning
Reinforcement Learning

About

I am Qi Cao, a 2nd-year Ph.D. student in the Department of Electrical and Computer Engineering at the University of California, San Diego, advised by Prof. Pengtao Xie.

Previously, I received my B.S. in Mathematics and Physics Class from the Yingcai Honors School at the University of Electronic Science and Technology of China, where I was fortunate to be advised by Prof. Liang-jian Deng.

I will join Meta as a research scientist intern in Summer 2026. My current research focuses on large language model (LLM) reasoning and building harness for reasoning model system.

News

2026-03

DeepTech, ASI, and AIxiv cover our work AIBuildAI!

2026-03

AIBuildAI ranks No.1 on MLE-bench!

2026-02

UCSD Today News covers our work DreamPRM!

2026-02

We built a project page for SCOPE.

2025-12

I will join Meta as a research scientist intern in Summer 2026!

2024-09

Starting my PhD at UCSD.

Selected Publications

View All →

AIBuildAI: An AI agent that automatically builds AI models

Ruiyi Zhang, Peijia Qin, Qi Cao, Li Zhang, Pengtao Xie

Arxiv Preprint

We present AIBuildAI, an AI agent that automatically builds AI models, with the goal of solving general AI tasks in an end-to-end manner.

Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning

Qi Cao, Shuhao Zhang, Ruizhe Zhou, Ruiyi Zhang, Peijia Qin, Pengtao Xie

Arxiv Preprint

SCOPE, a model routing framework that predicts how accurate and how expensive each model will be before running it, allowing users to control cost-accuracy trade-offs and naturally handle new models.

DreamPRM-1.5: Unlocking the Potential of Each Instance for Multimodal Process Reward Model Training

Qi Cao, Pengtao Xie

Arxiv Preprint

An instance-reweighting updated version of DreamPRM, higher accuracy and more robust.

DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

Qi Cao, Ruiyi Wang, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie

The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS)

Spotlight @ Multimodal Algorithmic Reasoning Workshop

A multimodal Process Reward Model (PRM) trained with domain-reweighting. Top 1 method on MathVista, MMMU & R-Bench-V.

Bidomain Modeling Paradigm for Pansharpening

Junming Hou, Qi Cao, Ran Ran, Che Liu, Junling Li, Liang-jian Deng

Proceedings of the 31st ACM international conference on multimedia (ACM MM)

Oral

We propose BiPan, a bidomain pansharpening framework that models band-specific local spectral features and global spatial details in the Fourier domain, achieving state-of-the-art performance by better handling spectral diversity and MS image degradation.