| 
            
              | 
                  Zheng Wang
                 I am a first year PhD student from Computer Science, University of Illinois Urbana-Champaign
, and my advisor is Prof. Minjia Zhang. My current research interests focus on the following two directions:
 (1) Leveraging sparsity, enhancing information processing efficiency, and combining system and algorithms to accelerate inference and training processes for LLMs and VLMs.
 (2) Utilizing empirical and theoretical insights to deeply understand LLMs and VLMs and optimize their performance.
  Prior to joining UIUC, I was very fortunate to be advised by Prof. Yingyan (Celine) Lin of EIC Lab as a Research Assistant from School of Computer Science, Georgia Tech.
		Outside of my academic life, I like to stay healthy by working out regularly. I'm also really into playing ๐พ tennis ๐พ โit's a fun, challenging sport that keeps me both physically and mentally sharp. 
                  Email  / 
                  Google Scholar  / 
 
                 |   |  
|  | 
    LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token FilteringZhifan Ye, Zheng Wang, Kejing Xia, Jihoon Hong, Leshu Li, Lexington A. Whalen, Cheng Wan, Haoran You, Celine Lin, Souvik Kundu 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 
      PDF
     |  
|  | 
    Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context TasksZheng Wang, Boxiao Jin, Zhongzhi Yu, Minjia Zhang preprint 
      PDF
     |  
|  | 
    Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention CalibrationZhongzhi Yu*, Zheng Wang*, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan (Celine) Lin 2024 International Conference of Machine Learning, ICML 2024 
      PDF |
      Code |
     |  
|  | 
    When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language ModelsHaoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan (Celine)Lin 2024 International Conference of Machine Learning, ICML 2024 
      PDF |
      Code |
     |  
|  | 
    EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning & VotingZhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin 61st ACM/IEEE Design Automation Conference, DAC 2024 
      PDF |
      Code |
     |  
|  | 
    XRouting: Explainable Vehicle Rerouting for Urban Road Congestion Avoidance using Deep Reinforcement LearningZheng Wang, Shen Wang 2022 IEEE Smart City Conference, ISC2 2022 
      PDF |
      Code |
     |  
  
    
      | Teaching
          
            Teaching Assistant CSE 8803 Machine Learning for Neural/Behavior Data, Georgia Tech, 2024 Fall, instructor: 
            Prof. Anqi Wu.
          
            Teaching Assistant CSE 6740 Computational Data Analysis (Machine Learning), Georgia Tech, 2025 Spring, instructor: 
            Prof. Anqi Wu.
           |  
  
    
      
      | Services
          Conference Reviewer ICML 2025, NeurIPS 2025, AAAI 2025, ICLR 2026 |  
  
    
      
      | Selected Awards
          [Jun. 2023] Excellent Graduates of Beijing[Nov. 2022] Presidential Fellowship in 2021-2022 Academic Year[Nov. 2022] Xiaomi Special Scholarship in 2021-2022 Academic Year |  |