đ Curriculum Vitae
đ Education
University of Science and Technology of China (USTC)
Sep. 2023 â Present
B.Eng. Candidate in Computer Science and Technology
Expected Graduation: June 2027
đ Publications
- [1]
Bowen Xue*, Zheng-Peng Duan*, Qixin Yan, Wenjing Wang, Hao Liu, Chun-Le Guo, Chongyi Li, Chen Li, and Jing LYU, Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation.
- [2]
Bowen Xue*, Zihan Min*, Xingyang Li*, Muyang Li, Yujun Lin, Zhekai Zhang, Haocheng Xi, Lvmin Zhang, Maneesh Agrawala, Jun-Yan Zhu, and Song Han, FourTune: Towards Fully 4-Bit Efficient Post-Training for Diffusion Models.
- [3]
Bowen Xue, Brandon Y. Feng, Chenguo Lin, Yuchen Lin, Yujia Zeng, Lvmin Zhang, Honglei Yan, and Panwang Pan, Ring Forcing: Towards Precise Long-Term Memory for Autoregressive Video Diffusion.
đŧ Experience
ByteDance
Sep. 2025 â Present
Research intern
- Proposed Ring Forcing to address the challenge of constructing and utilizing memory in long video generation. Extended the effective history span by 30Ã under a fixed sequence length, achieving minute-level memory and effectively maintaining long-term consistency.
Long Video Generation, AR Diffusion
MIT HANLab
Apr. 2025 â Feb. 2026
Research intern (Remote)
- Designed FourTune, the first training framework for diffusion models with 4-bit weights, activations, and gradients. Reduced VRAM usage by 2.25Ã and accelerated training by 2.27Ã on FLUX.1-dev compared to 16-bit LoRA, while supporting Qwen-Image. Achieved full-precision performance in Customization, RL, and Distillation tasks.
- Core contributor to nunchaku (3.7K Stars) and ComfyUI-nunchaku (2.8K Stars).
This library is an inference acceleration framework for 4-bit quantized diffusion models. Integrated PuLID into the nunchaku inference framework, significantly boosting inference speed and reducing VRAM usage while maintaining identity preservation and image quality.
Diffusion Model, Quantization, Acceleration, Post-Training
Tencent
Nov. 2024 â Sep. 2025
Research intern
- Proposed Stand-In (CVPR 2026), a lightweight and plug-and-play identity control framework for video generation, achieving SOTA face similarity and naturalness with minimal parameter and training costs.
- Designed a novel face-swapping algorithm outperforming IP-Adapter and InstantID with a 4Ã inference speedup. Conducted large-scale SDXL fine-tuning for high-quality generation, successfully deploying the pipeline in WeChat Channels.
AIGC, Video Generation, IP2V, IP2I, Diffusion Model
âšī¸ Others
- Competition: CCKS2025 Large Model Generated Text Detection - Rank 1/1094 (Leaderboard A)