News
[01/2026] Two papers accepted to ICLR 2026 (Slow Fast Policy Optimization; Universal Position Interpolation). Congratulations to all collaborators.
[01/2026] One paper accepted to ACL 2026 (Hidden States as Early Signals).
[01/2026] One paper accepted to EACL 2026 (Think Hard Only When Needed).
[09/2025] ORCHES accepted to MICRO 2025.
[08/2025] Started my PhD at UIUC with Prof. Minjia Zhang!
[05/2025] One paper accepted to ACL 2025 (LAMB).
[07/2024] KVMerger released on arXiv.
[05/2024] Two papers accepted to ICML 2024 (Attention Calibration for LLMs, Linearized-LLM).
[05/2024] One paper accepted to DAC 2024 (EDGE-LLM).
|
Publications
(* denotes equal contribution)
|
Efficient LLM Reasoning & Test-Time Compute
|
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
Ziyan Wang*, Zheng Wang* Xingwei Qu, Qi Cheng, Jie Fu, Shengpu Tang, Minjia Zhang, Xiaoming Huo
ICLR 2026
[Paper]
[Website]
[Code]
|
|
Hidden States as Early Signals: Step-level Trace Evaluation and Pruning for Efficient Test-Time Scaling
Zhixiang Liang, Beichen Huang, Zheng Wang, Minjia Zhang
ACL Findings, 2026
[Paper]
[Code]
|
|
Think Hard Only When Needed: A Hybrid Best-of-N and Beam Search for Efficient Test-Time Compute
Hyewon Suh, Chaojian Li, Cheng-Jhih Shih, Zheng Wang, Kejing Xia, Yonggan Fu, Yingyan (Celine) Lin
EACL 2026
[Paper]
|
|
ORCHES: Orchestrated Test-Time-Compute-based LLM Reasoning on Collaborative GPU–PIM Heterogeneous System
Sixu Li, Yuzhou Chen, Chaojian Li, Yonggan Fu, Zheng Wang, Zhongzhi Yu, Haoran You, Zhifan Ye, Wei Zhou, Yongan Zhang, Yingyan (Celine) Lin
MICRO 2025
[Paper]
|
Efficient LLM & VLM Inference
|
From Collapse to Control: Understanding and Extending Context Length in Emerging Hybrid Models via Universal Position Interpolation
Haochen Shen, Davis Wertheimer, Zheng Wang, Garrett Goon, Derrick Liu, Naigang Wang, Mudhakar Srivatsa, Raghu K. Ganti, Minjia Zhang
ICLR 2026
[Paper]
|
|
PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed Inference
Yushu Zhao*, Zheng Wang*, Minjia Zhang
arXiv preprint, 2025
[Paper]
|
|
LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering
Zhifan Ye, Zheng Wang, Kejing Xia, Jihoon Hong, Leshu Li, Lexington A. Whalen, Cheng Wan, Haoran You, Celine Lin, Souvik Kundu
ACL 2025
[Paper]
|
|
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks
Zheng Wang, Boxiao Jin, Zhongzhi Yu, Minjia Zhang
arXiv preprint, 2024
[Paper]
|
|
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan (Celine) Lin
ICML 2024
[Paper]
[Code]
|
|
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
Zhongzhi Yu*, Zheng Wang*, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan (Celine) Lin
ICML 2024
[Paper]
[Code]
|
|
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning & Voting
Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan (Celine) Lin
DAC 2024
[Paper]
[Code]
|
Teaching
- Teaching Assistant — CSE 8803 Machine Learning for Neural/Behavior Data, Georgia Tech, Fall 2024. Instructor: Prof. Anqi Wu.
- Teaching Assistant — CSE 6740 Computational Data Analysis (Machine Learning), Georgia Tech, Spring 2025. Instructor: Prof. Anqi Wu.
|
Services
- Conference Reviewer: ICML 2025, NeurIPS 2025, AAAI 2025, ICLR 2026, ICML 2026, ECCV 2026, NeurIPS 2026
|
Selected Awards
- [Jun. 2023] Excellent Graduates of Beijing
- [Nov. 2022] Presidential Fellowship, 2021–2022 Academic Year
- [Nov. 2022] Xiaomi Special Scholarship, 2021–2022 Academic Year
|
|