Zheng Wang

I am a first-year PhD student in Computer Science at the University of Illinois Urbana-Champaign, advised by Prof. Minjia Zhang. My research focuses on efficient and interpretable foundation models. Specifically:

(1) Leveraging sparsity and system–algorithm co-design to accelerate the inference and training of LLMs and VLMs.
(2) Developing empirical and theoretical insights to understand and optimize LLMs and VLMs.

Prior to UIUC, I was fortunate to be advised by Prof. Yingyan (Celine) Lin at the EIC Lab as a Research Assistant in the School of Computer Science, Georgia Tech.

Outside of research, I like to stay healthy by working out regularly, and I'm really into playing 🎾 tennis 🎾 — it keeps me both physically and mentally sharp.

Email  /  Google Scholar  /  Github

profile photo

News

[01/2026] Two papers accepted to ICLR 2026 (Slow Fast Policy Optimization; Universal Position Interpolation). Congratulations to all collaborators.

[01/2026] One paper accepted to ACL 2026 (Hidden States as Early Signals).

[01/2026] One paper accepted to EACL 2026 (Think Hard Only When Needed).

[09/2025] ORCHES accepted to MICRO 2025.

[08/2025] Started my PhD at UIUC with Prof. Minjia Zhang!

[05/2025] One paper accepted to ACL 2025 (LAMB).

[07/2024] KVMerger released on arXiv.

[05/2024] Two papers accepted to ICML 2024 (Attention Calibration for LLMs, Linearized-LLM).

[05/2024] One paper accepted to DAC 2024 (EDGE-LLM).

Publications

(* denotes equal contribution)

Efficient LLM Reasoning & Test-Time Compute
SFPO

Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning

Ziyan Wang*, Zheng Wang* Xingwei Qu, Qi Cheng, Jie Fu, Shengpu Tang, Minjia Zhang, Xiaoming Huo

ICLR 2026

STEP

Hidden States as Early Signals: Step-level Trace Evaluation and Pruning for Efficient Test-Time Scaling

Zhixiang Liang, Beichen Huang, Zheng Wang, Minjia Zhang

ACL Findings, 2026

EACL

Think Hard Only When Needed: A Hybrid Best-of-N and Beam Search for Efficient Test-Time Compute

Hyewon Suh, Chaojian Li, Cheng-Jhih Shih, Zheng Wang, Kejing Xia, Yonggan Fu, Yingyan (Celine) Lin

EACL 2026

TTC

ORCHES: Orchestrated Test-Time-Compute-based LLM Reasoning on Collaborative GPU–PIM Heterogeneous System

Sixu Li, Yuzhou Chen, Chaojian Li, Yonggan Fu, Zheng Wang, Zhongzhi Yu, Haoran You, Zhifan Ye, Wei Zhou, Yongan Zhang, Yingyan (Celine) Lin

MICRO 2025

Efficient LLM & VLM Inference
UPI

From Collapse to Control: Understanding and Extending Context Length in Emerging Hybrid Models via Universal Position Interpolation

Haochen Shen, Davis Wertheimer, Zheng Wang, Garrett Goon, Derrick Liu, Naigang Wang, Mudhakar Srivatsa, Raghu K. Ganti, Minjia Zhang

ICLR 2026

PuzzleMoE

PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed Inference

Yushu Zhao*, Zheng Wang*, Minjia Zhang

arXiv preprint, 2025

LAMB

LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering

Zhifan Ye, Zheng Wang, Kejing Xia, Jihoon Hong, Leshu Li, Lexington A. Whalen, Cheng Wan, Haoran You, Celine Lin, Souvik Kundu

ACL 2025

KVMerger

Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks

Zheng Wang, Boxiao Jin, Zhongzhi Yu, Minjia Zhang

arXiv preprint, 2024

Linearized-LLM

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan (Celine) Lin

ICML 2024

ACT

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Zhongzhi Yu*, Zheng Wang*, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan (Celine) Lin

ICML 2024

EDGE-LLM

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning & Voting

Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan (Celine) Lin

DAC 2024

Teaching

  • Teaching Assistant — CSE 8803 Machine Learning for Neural/Behavior Data, Georgia Tech, Fall 2024. Instructor: Prof. Anqi Wu.
  • Teaching Assistant — CSE 6740 Computational Data Analysis (Machine Learning), Georgia Tech, Spring 2025. Instructor: Prof. Anqi Wu.

Services

  • Conference Reviewer: ICML 2025, NeurIPS 2025, AAAI 2025, ICLR 2026, ICML 2026, ECCV 2026, NeurIPS 2026

Selected Awards

  • [Jun. 2023] Excellent Graduates of Beijing
  • [Nov. 2022] Presidential Fellowship, 2021–2022 Academic Year
  • [Nov. 2022] Xiaomi Special Scholarship, 2021–2022 Academic Year

Design and source code from jonbarron. Layout inspired by Yonggan Fu.