📄 Curriculum Vitae

🎓 Education

University of Science and Technology of China (USTC)

Sep. 2023 – Present

B.Eng. Candidate in Computer Science and Technology

Expected Graduation: June 2027

[1]
Bowen Xue*, Zheng-Peng Duan*, Qixin Yan, Wenjing Wang, Hao Liu, Chun-Le Guo, Chongyi Li, Chen Li, and Jing LYU, Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation.
CVPR 2026PDF GitHub (700+ Stars)Project Page
[2]
Bowen Xue*, Zihan Min*, Xingyang Li*, Muyang Li, Yujun Lin, Zhekai Zhang, Haocheng Xi, Lvmin Zhang, Maneesh Agrawala, Jun-Yan Zhu, and Song Han, FourTune: Towards Fully 4-Bit Efficient Post-Training for Diffusion Models.
[3]
Bowen Xue, Brandon Y. Feng, Chenguo Lin, Yuchen Lin, Yujia Zeng, Lvmin Zhang, Honglei Yan, and Panwang Pan, Ring Forcing: Towards Precise Long-Term Memory for Autoregressive Video Diffusion.

ByteDance

Sep. 2025 – Present

Research intern

Proposed Ring Forcing to address the challenge of constructing and utilizing memory in long video generation. Extended the effective history span by 30× under a fixed sequence length, achieving minute-level memory and effectively maintaining long-term consistency.

Long Video Generation, AR Diffusion

MIT HANLab

Apr. 2025 – Feb. 2026

Research intern (Remote)

Designed FourTune, the first training framework for diffusion models with 4-bit weights, activations, and gradients. Reduced VRAM usage by 2.25× and accelerated training by 2.27× on FLUX.1-dev compared to 16-bit LoRA, while supporting Qwen-Image. Achieved full-precision performance in Customization, RL, and Distillation tasks.
Core contributor to nunchaku (3.7K Stars) and ComfyUI-nunchaku (2.8K Stars).
This library is an inference acceleration framework for 4-bit quantized diffusion models. Integrated PuLID into the nunchaku inference framework, significantly boosting inference speed and reducing VRAM usage while maintaining identity preservation and image quality.

Diffusion Model, Quantization, Acceleration, Post-Training

Tencent

Nov. 2024 – Sep. 2025

Research intern

Proposed Stand-In (CVPR 2026), a lightweight and plug-and-play identity control framework for video generation, achieving SOTA face similarity and naturalness with minimal parameter and training costs.
Designed a novel face-swapping algorithm outperforming IP-Adapter and InstantID with a 4× inference speedup. Conducted large-scale SDXL fine-tuning for high-quality generation, successfully deploying the pipeline in WeChat Channels.

AIGC, Video Generation, IP2V, IP2I, Diffusion Model

Competition: CCKS2025 Large Model Generated Text Detection - Rank 1/1094 (Leaderboard A)