works

2026

  1. o45.png
    MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
    Junbo Cui, Bokai Xu, Chongyi Wang, Tianyu Yu, Weiyue Sun, Yingjing Xu, Tianran Wang, Zhihui He, Wenshuo Ma, Tianchi Cai, Jiancheng Gui, Luoyuan Zhang, Xian Sun, Fuwei Huang, Moye Chen, Zhuo Lin, Hanyu Liu, Qingxin Gui, Qingzhe Han, Yuyang Wen, Huiping Liu, Rongkang Wang, Yaqi Zhang, Hongliang Wei, Chi Chen, You Li, Kechen Fang, Jie Zhou, Yuxuan Li, Guoyang Zeng, Chaojun Xiao, Yankai Lin, Xu Han, Maosong Sun, Zhiyuan Liu, and Yuan Yao
    2026
  2. omni-duplex-eval.png
    Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction
    Chaoqun He, Mingyang Xiang, Yingjing Xu, Bokai Xu, Junbo Cui, Jie Zhou, Yuan Yao, and Lijie Wen
    2026
  3. lws.png
    Liberating LLM Capabilities in Full-Duplex Speech Models
    Luoyuan Zhang, Bokai Xu, Junbo Cui, Weiyue Sun, Yingjing Xu, Hanyu Liu, and Yuan Yao
    2026

2025

  1. o_26.png
    MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech, and Multimodal Live Streaming on Your Phone
    Yuan Yao, Tianyu Yu, Chongyi Wang, Junbo Cui, Bokai Xu, Hongji Zhu, Tianchi Cai, Fuwei Huang, Tianran Wang, Wenshuo Ma, Yixuan Zhou, Haoye Zhang, Zonghao Guo, Chi Chen, Haoyu Wang, Zhihui He, Haoyu Li, Hanyu Liu, Luoyuan Zhang, Ge Zhou, Siyuan Li, Zhi Zheng, Jie Zhou, Yuxuan Li, Kaihuo Zhang, Yudong Mei, Hanqing Zhao, Yueying Chen, Zhongwu Zhai, Hanbin Wang, Ganqu Cui, Ning Ding, Xu Han, Zhiyong Wu, Zhiyuan Liu, and Maosong Sun
    2025
  2. v45.png
    MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
    Tianyu Yu, Zefan Wang, Chongyi Wang, Fuwei Huang, Wenshuo Ma, Zhihui He, Tianchi Cai, Weize Chen, Yuxiang Huang, Yuanqian Zhao, Bokai Xu, Junbo Cui, Yingjing Xu, Liqing Ruan, Luoyuan Zhang, Hanyu Liu, Jingkun Tang, Hongyuan Liu, Qining Guo, Wenhao Hu, Bingxiang He, Jie Zhou, Jie Cai, Ji Qi, Zonghao Guo, Chi Chen, Guoyang Zeng, Yuxuan Li, Ganqu Cui, Ning Ding, Xu Han, Yuan Yao, Zhiyuan Liu, and Maosong Sun
    2025

2024

  1. visrag.png
    VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
    Shi Yu*, Chaoyue Tang*, Bokai Xu*, Junbo Cui*, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, and Maosong Sun
    ICLR 2025, 2024

2023

  1. tool_learning.png
    Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
    Ning Ding*, Yulin Chen*, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou
    EMNLP 2023, 2023
  2. ultrachat.png
    Tool Learning with Foundation Models
    Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, Yi Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, Jing Yi, Yuzhang Zhu, Zhenning Dai, Lan Yan, Xin Cong, Yaxi Lu, Weilin Zhao, Yuxiang Huang, Junxi Yan, Xu Han, Xian Sun, Dahai Li, Jason Phang, Cheng Yang, Tongshuang Wu, Heng Ji, Zhiyuan Liu, and Maosong Sun
    ACM Computing Surveys, 2023


manuscripts

2023

  1. xu2023brain_preview.png
    Brain Reconstruction by Self Supervised Semantic Stitching of Non-overlapping 3D Microscopic Image
    Bokai Xu, Chaoyu Yang, Fang Xu, and Pengcheng Zhou
    2023

2021

  1. social_net_mln.png
    A Social Network Platform Architecture Based on Markov Logic and Transformer Based Neuron Translation
    Bokai Xu, and Zhuoyu Li
    2021