Zheng Wang

I am a first year PhD student from Computer Science, University of Illinois Urbana-Champaign , and my advisor is Prof. Minjia Zhang. My current research interests focus on the following two directions:

(1) Leveraging sparsity, enhancing information processing efficiency, and combining system and algorithms to accelerate inference and training processes for LLMs and VLMs.
(2) Utilizing empirical and theoretical insights to deeply understand LLMs and VLMs and optimize their performance.

Prior to joining UIUC, I was very fortunate to be advised by Prof. Yingyan (Celine) Lin of EIC Lab as a Research Assistant from School of Computer Science, Georgia Tech.

Outside of my academic life, I like to stay healthy by working out regularly. I'm also really into playing ๐ŸŽพ tennis ๐ŸŽพ โ€”it's a fun, challenging sport that keeps me both physically and mentally sharp.

Email  /  Google Scholar  / 

profile photo

Research

LAMB

LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering

Zhifan Ye, Zheng Wang, Kejing Xia, Jihoon Hong, Leshu Li, Lexington A. Whalen, Cheng Wan, Haoran You, Celine Lin, Souvik Kundu

63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025

PDF

KVMerger

Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks

Zheng Wang, Boxiao Jin, Zhongzhi Yu, Minjia Zhang

preprint

PDF

ACT_pipeline

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Zhongzhi Yu*, Zheng Wang*, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan (Celine) Lin

2024 International Conference of Machine Learning, ICML 2024

PDF | Code |

speculation

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan (Celine)Lin

2024 International Conference of Machine Learning, ICML 2024

PDF | Code |

edge_LLM.png

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning & Voting

Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin

61st ACM/IEEE Design Automation Conference, DAC 2024

PDF | Code |

XRouting_model.png

XRouting: Explainable Vehicle Rerouting for Urban Road Congestion Avoidance using Deep Reinforcement Learning

Zheng Wang, Shen Wang

2022 IEEE Smart City Conference, ISC2 2022

PDF | Code |

Teaching

  • Teaching Assistant CSE 8803 Machine Learning for Neural/Behavior Data, Georgia Tech, 2024 Fall, instructor: Prof. Anqi Wu.
  • Teaching Assistant CSE 6740 Computational Data Analysis (Machine Learning), Georgia Tech, 2025 Spring, instructor: Prof. Anqi Wu.

Services

  • Conference Reviewer ICML 2025, NeurIPS 2025, AAAI 2025, ICLR 2026

Selected Awards

  • [Jun. 2023] Excellent Graduates of Beijing
  • [Nov. 2022] Presidential Fellowship in 2021-2022 Academic Year
  • [Nov. 2022] Xiaomi Special Scholarship in 2021-2022 Academic Year

Design and source code from jonbarron.