Zheng Wang

I am a first year PhD student from Computer Science, University of Illinois Urbana-Champaign , and my advisor is Prof. Minjia Zhang. My current research interests focus on the following two directions:

(1) Leveraging sparsity, enhancing information processing efficiency, and combining system and algorithms to accelerate inference and training processes for LLMs and VLMs.
(2) Utilizing empirical and theoretical insights to deeply understand LLMs and VLMs and optimize their performance.

Prior to joining UIUC, I was very fortunate to be advised by Prof. Yingyan (Celine) Lin of EIC Lab as a Research Assistant from School of Computer Science, Georgia Tech.

Outside of my academic life, I like to stay healthy by working out regularly. I'm also really into playing 🎾 tennis 🎾 —it's a fun, challenging sport that keeps me both physically and mentally sharp.

Email / Google Scholar /

Research

	LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering Zhifan Ye, Zheng Wang, Kejing Xia, Jihoon Hong, Leshu Li, Lexington A. Whalen, Cheng Wan, Haoran You, Celine Lin, Souvik Kundu 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 PDF
	Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks Zheng Wang, Boxiao Jin, Zhongzhi Yu, Minjia Zhang preprint PDF
	Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan (Celine) Lin 2024 International Conference of Machine Learning, ICML 2024 PDF \| Code \|
	When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan (Celine)Lin 2024 International Conference of Machine Learning, ICML 2024 PDF \| Code \|
	EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning & Voting Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin 61st ACM/IEEE Design Automation Conference, DAC 2024 PDF \| Code \|
	XRouting: Explainable Vehicle Rerouting for Urban Road Congestion Avoidance using Deep Reinforcement Learning Zheng Wang, Shen Wang 2022 IEEE Smart City Conference, ISC2 2022 PDF \| Code \|

Teaching

Teaching Assistant CSE 8803 Machine Learning for Neural/Behavior Data, Georgia Tech, 2024 Fall, instructor: Prof. Anqi Wu.
Teaching Assistant CSE 6740 Computational Data Analysis (Machine Learning), Georgia Tech, 2025 Spring, instructor: Prof. Anqi Wu.

Services

Conference Reviewer ICML 2025, NeurIPS 2025, AAAI 2025, ICLR 2026

Selected Awards

[Jun. 2023] Excellent Graduates of Beijing
[Nov. 2022] Presidential Fellowship in 2021-2022 Academic Year
[Nov. 2022] Xiaomi Special Scholarship in 2021-2022 Academic Year

Design and source code from jonbarron.

Research

LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering

Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning & Voting

XRouting: Explainable Vehicle Rerouting for Urban Road Congestion Avoidance using Deep Reinforcement Learning

Teaching

Services

Selected Awards