Ganqu Cui
Ph.D
Shanghai AI Laboratory
|
Biography
I am a researcher in Shanghai AI Laboratory. I obtained my Ph.D degree from THUNLP Lab, Dept. of Computer Science and Technology, Tsinghua University, advised by Prof. Zhiyuan Liu.
Before that, I obtained my B.S. degree of Mathematics and Physics, Tsinghua University in 2019.
My research interests lie at LLM alignment and reinforcement learning. Previously, I did some research on representation learning on graphs, especially graph neural networks and their application.
We are hiring full-time researchers and interns! If you are interested in building large reasoning models with reinforcement learning, please drop me an email (cuiganqu AT pjlab.org.cn).
News
-
[09/2025] Three papers were accepted by NeurIPS 2025.
-
[08/2025] Honored to receive the NSFC Fund.
-
[07/2025] Honored to receive the WAIC Yunfan Rising Star Award.
-
[05/2025] We study the entropy mechanism of RL for LLMs. This study (1) identifies and quantifies the entropy collapse; (2) analyses the entropy dynamics; (3) proposes simple methods to extend RL training. Checkout code here.
-
[01/2025] Proudly announce PRIME!, a scalable reinforcement learning method with implicit process rewards. Our model, Eurus-2-7B-PRIME, surpassed GPT-4o on advanced math benchmarks. Checkout our blog here.
-
[12/2024] We release Implicit PRM, you can get free process rewards without process labels!
-
[10/2024] CPO was accepted by EMNLP 2024.
-
[07/2024] I joined Shanghai AI Laboratory as a research scientist.
-
[07/2024] I graduated from Tsinghua with Tsinghua Outstanding Doctoral Dissertation award.
-
[05/2024] UltraFeedback was accepted by ICML 2024.
-
[04/2024] Checkout Eurus.
-
[09/2023] Checkout UltraFeedback.
-
[09/2023] One paper was accepted by NeurIPS 2023.
-
[05/2023] Four papers were accepted by ACL 2023 (2 Findings).
-
[04/2023] Checkout our Tool Learning paper.
-
[10/2022] One paper was accepted by EMNLP 2022.
-
[09/2022] Two papers were accepted by NeurIPS 2022 (1 Spotlight).
-
[05/2022] One paper was accepted by NAACL 2022 Findings.
-
[03/2022] One paper was accepted by ACL 2022.
* indicates equal contribution.
† indicates corresponding author.
-
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones?
Lifan Yuan*, Weize Chen*, Yuchen Zhang, Ganqu Cui†, Hanbin Wang, Ziming You, Ning Ding†, Zhiyuan Liu†, Maosong Sun, Hao Peng.
Preprint
[code]
-
HiPhO: How Far Are (M) LLMs from Humans in the Latest High School Physics Olympiad Benchmark?
Fangchen Yu*, Haiyuan Wan*, Qianjia Cheng*, Yuchen Zhang, Jiacheng Chen, Fujun Han, Yulun Wu, Junchi Yao, Ruilizhen Hu, Ning Ding, Yu Cheng, Tao Chen, Lei Bai, Dongzhan Zhou, Yun Luo, Ganqu Cui†, Peng Ye†.
Preprint
[code]
-
Intern-s1: A scientific multimodal foundation model
Lei Bai, ..., Ganqu Cui, ..., et al.
Preprint
[code]
-
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Ganqu Cui*, Yuchen Zhang*, Jiacheng Chen*, Lifan Yuan, Zhi Wang, Yuxin Zuo, Haozhan Li, Yuchen Fan, Huayu Chen, Weize Chen, Zhiyuan Liu, Hao Peng, Lei Bai, Wanli Ouyang, Yu Cheng, Bowen Zhou, Ning Ding.
Preprint
[code]
-
Scaling physical reasoning with the physics dataset
Shenghe Zheng*, Qianjia Cheng*, Junchi Yao*, Mengsong Wu, Haonan He, Ning Ding, Yu Cheng, Shuyue Hu, Lei Bai, Dongzhan Zhou, Ganqu Cui†, Peng Ye†.
NeurIPS Datasets & Benchmarks 2025
-
TTRL: Test-time reinforcement learning
Yuxin Zuo, Kaiyan Zhang, Shang Qu, Li Sheng, Xuekai Zhu, Biqing Qi, Youbang Sun, Ganqu Cui, Ning Ding, Bowen Zhou.
NeurIPS 2025
[code]
-
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Bingxiang He*, Wenbin Zhang*, Jiaxi Song, Cheng Qian, Zixuan Fu, Bowen Sun, Ning Ding, Haiwen Hong, Longtao Huang, Hui Xue, Ganqu Cui†, Wanxiang Che†, Zhiyuan Liu, Maosong Sun.
COLM 2025
-
UltraIF: Advancing Instruction Following from the Wild
Kaikai An*, Li Sheng*, Ganqu Cui†, Shuzheng Si, Ning Ding, Yu Cheng, Baobao Chang†.
EMNLP 2025
[code]
-
Process Reinforcement through Implicit Rewards
Ganqu Cui*, Lifan Yuan*, Zefan Wang, Hanbin Wang, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, Qixin Xu, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan Yao, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, Bowen Zhou, Ning Ding.
Preprint
[code]
-
Free Process Rewards without Process Labels
Lifan Yuan*, Wendi Li*, Huayu Chen, Ganqu Cui†, Ning Ding†, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu†, Hao Peng.
ICML 2025
[code]
-
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan*, Ganqu Cui*†, Hanbin Wang*, Ning Ding†, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, Bowen Zhou, Hao Peng, Zhiyuan Liu†, Maosong Sun.
ICLR 2025
[code]
-
Minicpm: Unveiling the potential of small language models with scalable training strategies (Oral)
Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zheng Leng Thai, Kaihuo Zhang, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun.
COLM 2024
[code]
-
Controllable preference optimization: Toward controllable multi-objective alignment
Yiju Guo*, Ganqu Cui*, Lifan Yuan, Ning Ding, Zexu Sun, Bowen Sun, Huimin Chen, Ruobing Xie, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun.
EMNLP 2024
[code]
-
UltraFeedback: Boosting Language Models with Scaled AI Feedback
Ganqu Cui*, Lifan Yuan*, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, Maosong Sun.
ICML 2024
[code]
-
Decoder Tuning: Efficient Language Understanding as Decoding
Ganqu Cui, Wentao Li, Ning Ding, Longtao Huang, Zhiyuan Liu, Maosong Sun.
ACL 2023
[code]
-
A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks (Spotlight)
Ganqu Cui*, Lifan Yuan*, Bingxiang, He, Yangyi Chen, Zhiyuan Liu, Maosong Sun.
NeurIPS Datasets & Benchmarks 2022
[code]
-
Prototypical Verbalizer for Prompt-based Few-shot Tuning
Ganqu Cui, Shengding Hu, Ning Ding, Longtao Huang, Zhiyuan Liu.
ACL 2022
[code]
-
Adaptive Graph Encoder for Attributed Graph Embedding
Ganqu Cui, Jie Zhou, Cheng Yang, Zhiyuan Liu.
KDD 2020
[code]
Honors & Awards
-
WAIC Yunfan Rising Star Award, 2025
-
Tsinghua Outstanding Doctoral Dissertation (Top 10%), 2024
-
Tsinghua Outstanding Graduate (Top 4%), 2024
-
Longfor Scholarship, 2022, 2023
-
Tsinghua-Sohu R&D Scholarship, 2022
-
Tsinghua-Tang Junyuan Scholarship, 2023
Invited Talks
-
2025.05, The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models @ Tsinghua Foundation Model Center
-
2025.02, Process Reinforcement through Implicit Rewards @ Apple, NVIDIA, Tencent, Ant Group
-
2025.01, Process Reinforcement through Implicit Rewards @ Huawei Noah's Ark Lab
Projects
-
PRIME-RL
Collection of reinforcement learning methods for large language models
-
OpenBackdoor
An open-scource toolkit for textual backdoor attack and defense
Professional Activities
-
Conference Reviews:
ICML,
ICLR,
The Web Conference,
EMNLP,
NeurIPS,
ACL,
SIGIR,
-
Journal Reviews:
TMLR,
IEEE Transactions on Knowledge and Data Engineering (TKDE),
AI Open,
Teaching Assistant
2019-2023 | Spring | TA in Natural Language Processing |
2022-2023 | Fall | TA in Writing and Communication |