Zheng Wang

I am a first-year PhD student in Computer Science at the University of Illinois Urbana-Champaign, advised by Prof. Minjia Zhang. My research focuses on efficient and interpretable foundation models. Specifically:

(1) Leveraging sparsity and system–algorithm co-design to accelerate the inference and training of LLMs and VLMs.
(2) Developing empirical and theoretical insights to understand and optimize LLMs and VLMs.

Prior to UIUC, I was fortunate to be advised by Prof. Yingyan (Celine) Lin at the EIC Lab as a Research Assistant in the School of Computer Science, Georgia Tech.

Outside of research, I like to stay healthy by working out regularly, and I'm really into playing 🎾 tennis 🎾 — it keeps me both physically and mentally sharp.

Email / Google Scholar / Github

News

[05/2026] I was selected as "Gold Reviewer" by ICML 2026.

[04/2026] PuzzleMoE is accepted to ICML 2026.

[01/2026] Two papers accepted to ICLR 2026 (Slow Fast Policy Optimization; Universal Position Interpolation). Congratulations to all collaborators.

[01/2026] One paper accepted to ACL 2026 (Hidden States as Early Signals).

[01/2026] One paper accepted to EACL 2026 (Think Hard Only When Needed).

[09/2025] ORCHES accepted to MICRO 2025.

[08/2025] Started my PhD at UIUC with Prof. Minjia Zhang!

[05/2025] One paper accepted to ACL 2025 (LAMB).

[07/2024] KVMerger released on arXiv.

[05/2024] Two papers accepted to ICML 2024 (Attention Calibration for LLMs, Linearized-LLM).

[05/2024] One paper accepted to DAC 2024 (EDGE-LLM).

Publications

(* denotes equal contribution)

Efficient LLM Reasoning & Test-Time Compute

	Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning Ziyan Wang, Zheng Wang Xingwei Qu, Qi Cheng, Jie Fu, Shengpu Tang, Minjia Zhang, Xiaoming Huo ICLR 2026 [Paper] [Website] [Code]
	Hidden States as Early Signals: Step-level Trace Evaluation and Pruning for Efficient Test-Time Scaling Zhixiang Liang, Beichen Huang, Zheng Wang, Minjia Zhang ACL Findings, 2026 [Paper] [Code]
	Think Hard Only When Needed: A Hybrid Best-of-N and Beam Search for Efficient Test-Time Compute Hyewon Suh, Chaojian Li, Cheng-Jhih Shih, Zheng Wang, Kejing Xia, Yonggan Fu, Yingyan (Celine) Lin EACL 2026 [Paper]
	ORCHES: Orchestrated Test-Time-Compute-based LLM Reasoning on Collaborative GPU–PIM Heterogeneous System Sixu Li, Yuzhou Chen, Chaojian Li, Yonggan Fu, Zheng Wang, Zhongzhi Yu, Haoran You, Zhifan Ye, Wei Zhou, Yongan Zhang, Yingyan (Celine) Lin MICRO 2025 [Paper]

Efficient LLM Inference & Compression

	From Collapse to Control: Understanding and Extending Context Length in Emerging Hybrid Models via Universal Position Interpolation Haochen Shen, Davis Wertheimer, Zheng Wang, Garrett Goon, Derrick Liu, Naigang Wang, Mudhakar Srivatsa, Raghu K. Ganti, Minjia Zhang ICLR 2026 [Paper]
	PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed Inference Yushu Zhao, Zheng Wang, Minjia Zhang ICML 2026 [Paper]
	LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering Zhifan Ye, Zheng Wang, Kejing Xia, Jihoon Hong, Leshu Li, Lexington A. Whalen, Cheng Wan, Haoran You, Celine Lin, Souvik Kundu ACL 2025 [Paper]
	Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks Zheng Wang, Boxiao Jin, Zhongzhi Yu, Minjia Zhang arXiv preprint, 2024 [Paper]
	When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan (Celine) Lin ICML 2024 [Paper] [Code]
	Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan (Celine) Lin ICML 2024 [Paper] [Code]
	EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning & Voting Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan (Celine) Lin DAC 2024 [Paper] [Code]

Teaching

Teaching Assistant — CSE 8803 Machine Learning for Neural/Behavior Data, Georgia Tech, Fall 2024. Instructor: Prof. Anqi Wu.
Teaching Assistant — CSE 6740 Computational Data Analysis (Machine Learning), Georgia Tech, Spring 2025. Instructor: Prof. Anqi Wu.

Services

Conference Reviewer: ICML 2025, NeurIPS 2025, AAAI 2025, ICLR 2026, ICML 2026, ECCV 2026, NeurIPS 2026

Selected Awards

[Jun. 2023] Excellent Graduates of Beijing
[Nov. 2022] Presidential Fellowship, 2021–2022 Academic Year
[Nov. 2022] Xiaomi Special Scholarship, 2021–2022 Academic Year

Design and source code from jonbarron. Layout inspired by Yonggan Fu.