HOME| Members| News| Publications | Awards| Media | Talks |

 

Kun Li (李琨)
Senior Researcher, Systems and Networking Research Group, Microsoft Research Asia

Research interests:  
scientific AI   high-performance parallel algorithm & LLM system
Bilibili Logo Google Scholar Logo Microsoft Logo Outlook Logo WeChat Logo WeChat Official Account Logo

Brief Biography

  • Dr. Kun Li is a Senior Researcher at the Systems and Networking Research Group, Microsoft Research Asia since Jul. 2022. He earned his Ph.D. degree with the State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) in 2022. His dissertation, titled Research and Application on Multi-level Discontinuous and Nonlinear Scalability for Massively Parallelism, was recognized with CCF Outstanding Doctoral Dissertation Award and ACM SIGHPC China Outstanding Doctoral Dissertation Award. He has also received the CCF HPC Young Scientist Award, the ACM SIGHPC China Rising Star Award, among others. He is a keynote speaker at the CCF HPCChina 2024 conference, an Executive Member of the CCF Technical Committee on High-Performance Computing and a member of the CCF Technical Committee on Computer Architecture.
  • Dr. Kun Li has published extensively in top-tier international conferences and journals. Notably, his Cloud4Science series of work has been continuously featured in premier HPC conferences such as SC and PPoPP, and received the Best Paper Award at PPoPP 2024. Currently, He leads the Scientific AI & LLM System project. His research addresses key challenges in scientific reasoning, scalability, and LLM inference efficiency, including breakthroughs in long-text generation and low-latency systems. He actively collaborates with academic and industry partners to bridge HPC and AI to enable scalable, high-performance solutions for scientific intelligence and large-scale AI systems. If you're interested, feel free to join him in exploring these exciting frontiers!
  • News

    • [Nov. 2024] Our paper "FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units" is accepted by PPoPP'25. Congratulations to Haozhi!
    • [Nov. 2024] Our paper "Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers" is accepted by PPoPP'25. Congratulations to Yiwei!
    • [Oct. 2024] Awarded with 2024 ACM SIGHPC 中国新星奖! [More]
    • [Sep. 2024] Awarded with 2024 CCF高性能计算青年科技人才奖! [More]
    • [Aug. 2024] Our paper "LONG EXPOSURE: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity" is accepted by SC'24. Congratulations to Tuowei!
    • [Aug. 2024] Our paper "LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores" is accepted by SC'24. Congratulations to Yiwei!
    • [Mar. 2024] Our paper "ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores" wins PPOPP'24 Best Paper Award!

    Selected Publications

      *: Corresponding author.
    • [To be appeared]   Tuowei Wang, Kun Li *, Donglin Bai, Fusong Ju, Leo Xia, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang. Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation. [Paper]
    • [PPOPP'25]   Haozhi Han, Kun Li *, Wei Cui, Donglin Bai, Yiwei Zhang, Liang Yuan, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang. FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units. [Paper]
    • [PPOPP'25]   Yiwei Zhang, Kun Li *, Liang Yuan, Haozhi Han, Yunquan Zhang, Ting Cao, Mao Yang. Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers. [Paper]
    • [SC'24]   Yiwei Zhang, Kun Li *, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang. LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores. [Paper]
    • [SC'24]   Tuowei Wang, Kun Li *, Zixu Hao, Donglin Bai, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang. LONG EXPOSURE: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity. [Paper]
    • [IPDPS'24]   Luhan Wang, Haipeng Jia, Lei xu, Cunyang Wei, Kun Li , Xianmeng Jiang, Yunquan Zhang. VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs.
    • [PPOPP'24, [Best Paper Award] ]   Yuetao Chen, Kun Li *, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang. ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores. [Paper]
    • [ICS'23]   Tun Chen, Haipeng Jia, Yunquan Zhang, Kun Li, Zhihao Li, Xiang Zhao, Jianyu Yao. OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs.
    • [TPDS'23]   Hang Cao, Liang Yuan, He Zhang, Yunquan Zhang, Baodong Wu, Kun Li, Shigang Li, Minghua Zhang, Pengqi Lu, and Junmin Xiao. AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3D Parallelization and Leap-Format.
    • [HPCC'22]   Luhan Wang, Haipeng Jia, Yunquan Zhang, Kun Li, and Cunyang Wei. EgpuIP: An Embedded GPU Accelerated Library for Image Processing.
    • [HPCC'22]   Cunyang Wei, Haipeng Jia, Yunquan Zhang, Kun Li, and Luhan Wang. LBBGEMM: A Load-Balanced Batch GEMM Framework on ARM CPUs.
    • [IPDPS'22]   Kun Li, Liang Yuan, Yunquan Zhang, Yue Yue, and Hang Cao. An Efficient Vectorization Scheme for Stencil Computation. [Paper]
    • [TPDS'22]   Kun Li, Liang Yuan, Yunquan Zhang, and Gongwei Chen. An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering. [Paper]
    • [SC'21]   Kun Li, Liang Yuan, Yunquan Zhang, and Yue Yue. Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations. [Paper]
    • [SC'21]   Liang Yuan, Hang Cao, Yunquan Zhang, Kun Li, Pengqi Lu, and Yue Yue. Temporal Vectorization for Stencils. [Paper]
    • [SC'19]   Kun Li, Honghui Shang, Yunquan Zhang, Shigang Li, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, and Zhiqiang Wei. OpenKMC : a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight. (Acceptance rate: 22.7%, 78/344) [Paper]
    • [ISPA'19]   Kun Li, Shigang Li, Bei Wang, Yifeng Chen, and Yunquan Zhang. swMD: Performance Optimizations for Molecular Dynamics Simulation on Sunway Taihulight. [Paper]
    • [JSUPERCOMPUT'19]   Kun Li, Shigang Li, Shan Huang, Yifeng Chen, and Yunquan Zhang. FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations. The Journal of Supercomputing (2019): 1-20. [Paper]
    • [ICPP'18]   Junmin Xiao, Shigang Li, Baodong Wu, He Zhang, Kun Li, Erlin Yao, Yunquan Zhang, and Guangming Tan. Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model. [Paper]
    • [JCST'17]   Kun Li, Haipeng Jia, Ting Cao, and Yunquan Zhang. The Implementation and Optimization of Multidimensional FFT Algorithm on Large-scale Clusters. The Journal of Frontiers of Computer Science and Technology, 2017. [Paper]
    • [HPCChina'16]   Kun Li, Yan Li, Ting Cao, Haipeng Jia, and Yunquan Zhang. An MPI-based 3D FFT Implementation on CPUGPU Heterogeneous Clusters. National Annual Conference on High Performance Computing 2016.
    • Selected Awards & Position

      Media

      • Dec.23, 2024. Featured by Microsoft Research, Toward Zettascale Computing: Accelerating Scientific Discovery with the Cloud4Science”. Microsoft Wechat Synced Zhihu Tencent Ithome
      • Feb.24, 2023. Featured by Microsoft Research, 科学匠人 | 李琨:执著于高性能计算研究的“别人家的孩子”. Microsoft Wechat Bilibili Zhihu Tencent
      • Jan.10, 2023. Featured by ICT, CAS, 学术科研 | 计算所两篇论文入选2022年“CCF优秀博士学位论文激励计划”. CAS Wechat
      • Jul.20, 2022. Featured by ICT, CAS, 毕业生故事 | 与你相见,千万次不曾放弃. CAS

      Talks


      Last updated on 12/30/2024.