Zheng Wang

I am the second year master student from CSE-COC, Georgia Institute of Technology. I am planning to apply for the PhD program starting in Fall 2025. My current research interests focus on the following two directions:
(1) Leveraging sparsity, enhancing information processing efficiency, and combining hardware-efficient implementations to accelerate inference and training processes for LLMs and VLMs.
(2) Utilizing empirical and theoretical insights to deeply understand LLMs and VLMs and optimize their performance.

I am very fortunate to be advised by Prof. Yingyan (Celine) Lin of EIC Lab as a Research Assistant from School of Computer Science, Georgia Tech. Additionally, I am advised by Prof. Minjia Zhang as a 2024 summer research intern from Department of Computer Science, University of Illinois Urbana-Champaign.

Outside of my academic life, I like to stay healthy by working out regularly. I'm also really into playing ๐ŸŽพ tennis ๐ŸŽพ โ€”it's a fun, challenging sport that keeps me both physically and mentally sharp.

Email  /  CV  / 

profile photo

Research

mamba

LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering

Zhifan Ye, Zheng Wang, Kejing Xia, Jihoon Hong, Leshu Li, Lexington Whalen, Cheng Wan, Yonggan Fu, Yingyan Celine Lin, Souvik Kundu

under review, ACL 2025

ZoomVLM

SpotVLM: A Tuning-Free Framework for Efficient and Scalable Video VLMs via Anchor-Based Summarization and Predictive Spotlight

Zhongzhi Yu, Zheng Wang, Chaojian Li, Hongxu Yin, Jihoon Hong, Yonggan Fu, Zhenyang Chen, Jan Kautz, Pavlo Moclchanov, Yingyan (Celine) Lin

under review, ICML 2025

KVMerger

Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks

Zheng Wang, Boxiao Jin, Yuming Chang, Zhongzhi Yu, Minjia Zhang

under review, ICML 2025

PDF |

ACT_pipeline

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Zhongzhi Yu*, Zheng Wang*, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan (Celine) Lin

2024 International Conference of Machine Learning, ICML 2024

PDF | Code |

speculation

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan (Celine)Lin

2024 International Conference of Machine Learning, ICML 2024

PDF | Code |

edge_LLM.png

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning & Voting

Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin

61st ACM/IEEE Design Automation Conference, DAC 2024

PDF | Code |

XRouting_model.png

XRouting: Explainable Vehicle Rerouting for Urban Road Congestion Avoidance using Deep Reinforcement Learning

Zheng Wang, Shen Wang

2022 IEEE Smart City Conference, ISC2 2022

PDF | Code |

Teaching

  • Teaching Assistant CSE 8803 Machine Learning for Neural/Behavior Data, Georgia Tech, 2024 Fall, instructor: Prof. Anqi Wu.
  • Teaching Assistant CSE 6740 Computational Data Analysis (Machine Learning), Georgia Tech, 2025 Spring, instructor: Prof. Anqi Wu.

Selected Awards

  • [Jun. 2023] Excellent Graduates of Beijing
  • [Nov. 2022] Presidential Fellowship in 2021-2022 Academic Year
  • [Nov. 2022] Xiaomi Special Scholarship in 2021-2022 Academic Year

Design and source code from jonbarron.