| Time | What | 
		| 08:00 | Opening | 
	
		| 08:15 | Keynote: 
          
          AI's memory Challenges
          , 
          Jae W. Lee, Seoul National University. | 
	
		| 09:15 | Break | 
	
		| 09:45 | Session 1: LLM Systems at Scale 
 
      Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU LimitsDowon Kim (Hanyang University), Minjae Lee (Hanyang University), Janghyeon Kim (Hanyang University), HyuckSung Kwon (Hanyang University), Hyeonggyu Jeong (Hanyang University), Sang-Soo Park (Samsung Electronics), Minyong Yoon (Samsung Electronics), Si-Dong Roh (Samsung Electronics), Jinin So (Samsung Electronics), Jungwook Choi (Hanyang University), and Yongsuk Kwon (Samsung Electronics)
SPipe: Hybrid GPU and CPU Pipeline for Training LLMs under Memory PressureJunyeol Ryu (Seoul National University), Yujin Jeong (Seoul National University), Daeyoung Park (Seoul National University), Jinpyo Kim (Seoul National University), Heehoon Kim (Seoul National University), and Jaejin Lee (Seoul National University)
ScaleMoE: A Fast and Scalable Distributed Training Framework for Large-Scale Mixture-of-Experts ModelsSeohong Choi (Sungkyunkwan University), Huize Hong (Sungkyunkwan University), Tae Hee Han (Sungkyunkwan University), and Joonsung Kim (Sungkyunkwan University)
LibraPIM: Dynamic Load Rebalancing to Maximize Utilization in PIM-Assisted LLM Inference SystemsHyeongjun Cho (Sungkyunkwan University), Yoonho Jang (Sungkyunkwan University), Hyungi Kim (Sungkyunkwan University), Seongwook Kim (Sungkyunkwan University), Keewon Kwon (Sungkyunkwan University), Gwangsun Kim (POSTECH), and Seokin Hong (Sungkyunkwan University)
Doppeladler: Adaptive Tensor Parallelism for Latency-Critical LLM Deployment on CPU-GPU Integrated End-User DeviceJiazhi Jiang (Sun Yat-sen University), Xiao Liu (Sun Yat-sen University), Jiangsu Du (Sun Yat-sen University), Dan Huang (Sun Yat-sen University), and Yutong Lu (Sun Yat-sen University)
 | 
	
		| 12:00 | Lunch | 
	
		| 13:15 | Session 2: Memory Systems & Caching 
 
      Exploring Memory Tiering Systems in the CXL Era via FPGA-based Emulation and Device-Side ManagementYiqi Chen (Peking University), Xiping Dong (Peking University), Zhe Zhou (Peking University), Zhao Wang (Peking University), Jie Zhang (Peking University), and Guangyu Sun (Peking University)
CPC: Coordinated Page Cache for Serverless ComputingKeun Soo Lim (Seoul National University), Yunjay Hong (Seoul National University), Jongheon Jeong (Seoul National University), Sam Son (UC Berkeley), Donguk Kim (Seoul National University), Yeonhong Park (Seoul National University), Jae W. Lee (Seoul National University), and Jinkyu Jeong (Yonsei University)
SCREME: A Scalable Framework for Resilient Memory DesignFan Li (University of Central Florida), Mimi Xie (University of Texas at San Antonio), Yanan Guo (University of Rochester), Huize Li (University of Central Florida), and Xin Xin (University of Central Florida)
Cache Miss Curve Analysis via Cardinality DomainEishi Arima (Technical University of Munich) and Martin Schulz (Technical University of Munich)
EARTH: Efficient Architecture for RISC-V Vector Memory AccessHongyi Guan (Tsinghua University), Yichuan Gao (Intel Labs China), Chenlu Miao (Intel Labs China), Haoyang Wu (Intel Labs China), Hang Zhu (Independent Researcher), Mingfeng Lin (Shenzhen University), and Huayue Liang (Intel Labs China)
 | 
	
		| 15:30 | Break | 
	
		| 16:00 | Session 3: GPU Algorithms for Irregular Workloads & OLAP 
 
      ANG: Accelerating NFA processing on GPUs via Exploring Multi-Level Fine-Grained ParallelismYuguang Wang (Michigan Technological University), Yunmo Zhang (City University of Hong Kong), Zeyu Liu (City University of Hong Kong), Junqiao Qiu (City University of Hong Kong), and Zhenlin Wang (Michigan Tech)
Accelerating DFS-based Subgraph Matching on GPU via Reusing IntersectionChen Chen (National University of Defense Technology), Shanzhi Gu (National University of Defense Technology), Junsheng Chang (National University of Defense Technology), and Li Shen (National University of Defense Technology)
Multiway Merge Partitioning for Sparse-Sparse Matrix Multiplication on GPUsEric Lorimer (Georgia Institute of Technology), Ruobing Han (Georgia Institute of Technology), Sung Ha Kang (Georgia Institute of Technology), and Hyesoon Kim (Georgia Institute of Technology)
DMO-DB: Mitigating the Data Movement Bottlenecks of GPU-Accelerated Relational OLAPChaemin Lim (Yonsei University), Suhyun Lee (Yonsei University), Jinwoo Choi (Yonsei University), Joonsung Kim (Sungkyunkwan University), Jinho Lee (Seoul National University), and Youngsok Kim (Yonsei University)
 | 
	
		| 18:00 | Reception and Posters | 
	
		| Time | What | 
		| 08:15 | Keynote: 
          
          Efficient Big Graph Analytics Via Redundancy Reduction
          , 
          Rajiv Gupta, University of California, Riverside. | 
	
		| 09:15 | Break | 
	
		| 09:45 | Session 4: Compilers & Program Generation 
 
      LOOPer: A Learned Automatic Code Optimizer For Polyhedral CompilersMassinissa Merouani (New York University Abu Dhabi), Afif Boudaoud (New York University Abu Dhabi), Iheb Nassim Aouadj (New York University Abu Dhabi), Nassim Tchoulak (Ecole Nationale Supérieure d'Informatique), Islem Kara Bernou (Ecole Nationale Supérieure d'Informatique), Hamza Benyamina (New York University Abu Dhabi), Fatima Benbouzid-Si Tayeb (École nationale supérieure d'informatique), Karima Benatchba (Ecole Nationale Supérieure d'Informatique), Hugh Leather (Meta), and Riyadh Baghdadi (New York University Abu Dhabi)
Agentic Auto-Scheduling: An Experimental Study of LLM-Based Loop OptimizationMassinissa Merouani (New York University Abu Dhabi), Islem Kara Bernou (New York University Abu Dhabi), and Riyadh Baghdadi (New York University Abu Dhabi)
Guess, Measure & Edit: Using Lowering to Lift Tensor CodeJosé Wesley De Souza Magalhães (Universtiy of Edinburgh), Jackson Woodruff (Universtiy of Edinburgh), Jordi Armengol-Estapé (Universtiy of Edinburgh), Alexander Brauckmann (Universtiy of Edinburgh), Luc Jaulmes (Universtiy of Edinburgh), Elizabeth Polgreen (Universtiy of Edinburgh), and Michael O'Boyle (University of Edinburgh)
Automatic Generation of Actor-based Parallelism from Shared Memory Parallel ProgramsJun Shirako (Georgia Institute of Technology) and Vivek Sarkar (Georgia Institute of Technology)
Automatic Code-Generation for Accelerating Structured-Mesh-Based Explicit Numerical Solvers on FPGAsBeniel Thileepan (Department of Computer Science, University of Warwick), Suhaib A Fahmy (King Abdullah University of Science and Technology (KAUST)), and Gihan R Mudalige (Department of Computer Science, University of Warwick)
 | 
	
		| 12:00 | Lunch / Business Meeting | 
	
		| 13:15 | Session 5: Specialized Accelerators 
 
      FLASH: An Abstract Machine for Modeling Fully Homomorphic Encryption AcceleratorsAlireza Tabatabaeian (Simon Fraser University) and Arrvindh Shriraman (Simon Fraser University)
Energy-Efficient Acceleration of Hash-Based Post-Quantum Cryptographic Schemes on Embedded Spatial ArchitecturesYanze Wu (George Mason University) and Md Tanvir Arafin (George Mason University)
Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model AccelerationRobin Geens (MICAS (KU Leuven)), Arne Symons (MICAS (KU Leuven)), and Marian Verhelst (MICAS (KU Leuven))
Squire: A General-Purpose Accelerator to Exploit Fine-Grain Parallelism on Dependency-Bound KernelsRubén Langarita (Barcelona Supercomputing Center), Pablo Ibáñez-Marín (Universidad de Zaragoza), Jesús Alastruey-Benedé (Universidad de Zaragoza), Miquel Moreto (UPC/BSC), Santiago Marco-Sola (Universitat Politècnica de Catalunya - Barcelona Supercomputing Center), and Adrià Armejach (Barcelona Supercomputing Center)
Bancroft: Genomics Acceleration Beyond On-Device MemorySe-Min Lim (University of California, Irvine), Seongyoung Kang (University of California, Irvine), and Sang-Woo Jun (University of California, Irvine)
 | 
	
		| 15:30 | Break | 
	
		| 16:00 | Session 6: Edge & Mobile AI Systems 
 
      Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized RecommendationsYujeong Choi (Google), John Kim (KAIST), and Minsoo Rhu (KAIST)
Salient Store: Enabling Smart Storage for Continuous Learning Edge ServersCyan Subhra Mishra (The Pennsylvania State University), Deeksha Chaudhary (The Pennsylvania State University), Mahmut T Kandemir (The Pennsylvania State University), and Chita Das (The Pennsylvania State University)
Bit-Level Semantics: Scalable RAG Retrieval with Neurosymbolic Hyperdimensional ComputingHyunsei Lee (DGIST), Shinhyoung Jang (DGIST), Jaewoo Gwak (DGIST), Jongho Park (DGIST), and Yeseong Kim (DGIST)
Optimizing 3D Gaussian Splattering for Mobile GPUsMd Musfiqur Rahman Sanim (University of Georgia), Zhihao Shu (University of Georgia), Bahram Afsharmanesh (University of Georgia), AmirAli Mirian (University of Georgia), Jiexiong Guan (William & Mary), Wei Niu (University of Georgia), Bin Ren (William & Mary), and Gagan Agrawal (University of Georgia)
 | 
	
		| Time | What | 
		| 08:15 | Session 7: Communication, Profiling & Mapping across CPU–GPU Clusters 
 
      Generating Two-Level, GPU-Aware Mappings for Distributed Tensor ComputationsBotao Wu (The Ohio State University) and Martin Kong (The Ohio State University)
GPU Stream-Aware Communication for Effective PipeliningNaveen Namashivayam (University of Minnesota at Twin Cities), Krishna Kandalla (Hewlett Packard Enterprise), Pen-Chung Yew (University of Minnesota at Twin Cities), James B White III (Hewlett Packard Enterprise), Larry Kaplan (Hewlett Packard Enterprise), and Mark Pagel (Hewlett Packard Enterprise)
TPE: XPU-Point: Simulator-Agnostic Sample Selection Methodology for Heterogeneous CPU-GPU ApplicationsAlen Sabu (Arm), Harish Patil (Intel), Wim Heirman (Intel), Changxi Liu (National University of Singapore), and Trevor E. Carlson (National University of Singapore)
 | 
	
		| 09:30 | Break | 
	
		| 09:45 | SRC Poster Presentations | 
    
		| 10:45 | Session 8: Novel Parallel Architectures & Runtime Mechanisms 
 
      A Stable Marriage Requires a Shared Residence with Low Contention and Mutual ComplementarityJiaxin Liu (The Ohio State University), Rubao Lee (Freelance), Cathy Xia (The Ohio State University), and Xiaodong Zhang (The Ohio State University)
Optimize Winograd Convolution for a Novel MIMD Many-core Architecture PEZY-SC3sYi Zhou (National University of Defense Technology), Qinglin Wang (National University of Defense Technology), Lian Wang (Shanxi Supercomputing Center), Zhiyan Liu (Shanxi Supercomputing Center), Bingwei Wang (Shanxi Supercomputing Center), Feiming Liu (Shanxi Supercomputing Center), Xiangdong Pei (Shanxi Supercomputing Center), and Jie Liu (National University of Defense Technology)
CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled OperationsZhuolun Jiang (Institute of Computing Technology, Chinese Academy of Sciences), Songyue Wang (Institute of Computing Technology, Chinese Academy of Sciences), Xiaokun Pei (Institute of Computing Technology, Chinese Academy of Sciences), Tianyue Lu (Institute of Computing Technology, Chinese Academy of Sciences), and Mingyu Chen (Institute of Computing Technology, Chinese Academy of Sciences)
 | 
	
		| 12:00 | Closing | 
  
    | 
        Tuesday, November 4: 
        AI's Memory Challenges
      Jae W. Lee, Seoul National University | 
  
    |  | 
  
    | Modern AI systems face a fundamental shift: memory and storage access, not computation, are becoming the primary performance bottleneck. Transformer architectures with O(L²) self-attention complexity struggle with bandwidth constraints, while agentic AI’s expanding context demands, with explicit token-space representation (KV cache), intensify capacity pressure. Memory reliability is also becoming a more critical concern, especially in the context of long-running training. This talk explores the full spectrum of challenges reshaping AI platforms and several avenues to turn these bottlenecks into breakthroughs. I’ll discuss recent solutions spanning hardware–software co-design, bandwidth-efficient algorithms, Processing-in-Memory/Near-Data Processing, and NAND flash’s evolution into specialized "AI memory" for KV caches and vector databases. This talk will explore how memory-centric design will define scalable AI infrastructure — and what opportunities lie ahead for systems researchers and architects. | 
  
    |  | 
  
    | Jae W. Lee is a Professor of Computer Science and Engineering and the Director of the AI Institute at Seoul National University (SNU). His research focuses on computer architecture, systems, parallel programming, and hardware security, with recent emphasis on memory-centric hardware/software co-design for AI. His work has been recognized with various awards and honors, including the IEEE Symposium on VLSI Circuits Most Frequently Cited Paper Award in 30 Years (2017), the ACM ASPLOS Most Influential Paper Award (2014), two Google Research Awards (2024, 2025), the ISCA Hall of Fame (2021), the ASPLOS Hall of Fame (2021), two IEEE Micro "Top Picks" selections (2020), the ACM ASPLOS "Highlights" Paper (2017), the HiPEAC Paper Award at ACM PLDI (2012), and the IEEE PACT Top Paper (2010). He has served as Program Co-Chair of the IEEE Micro "Top Picks" special issue (2023) and the ACM International Symposium on Memory Management (ISMM) (2024), General Chair of the International Symposium on Code Generation and Optimization (CGO) (2021, 2022), and as a PC member of numerous top-tier computer architecture and systems conferences, such as ISCA, ASPLOS, MICRO, MLSys, HPCA, SC, and Hot Chips. He spent a year as a visiting faculty researcher at Google DeepMind in Mountain View, CA, in 2022–2023. Before joining SNU, he was a research associate at Princeton University and a researcher and engineer at Parakinetics, Inc., where he conducted research on multicore software optimization. He also contributed to multiple successful VLSI implementation projects, including the Physical Uncloneable Function (PUF) and RAW Microprocessor at MIT. He received his B.S. in EE from SNU, his M.S. in EE from Stanford, and his Ph.D. in CS from MIT. 
 | 
  
    |  | 
  
    |  | 
  
    |  | 
  
    | 
        Wednesday, November 5: 
        Efficient Big Graph Analytics Via Redundancy Reduction
        Rajiv Gupta, University of California, Riverside | 
  
    |  | 
  
    | Analyses on large graphs are an increasingly important computational workload, as graph analytics is employed in many domains. Therefore, a significant amount of research in this area has focused on developing frameworks that leverage the parallelism available on various hardware platforms, ranging from a single GPU or multicore server to a cluster of servers and/or GPUs. In this talk, I will describe our work, which combines parallelism with a complementary approach that comprehensively reduces redundancy to improve scalability. Redundancy can be found and removed not only from the computation and propagation of values, but also from graph traversal and data transfer across the memory hierarchy. Our work applies redundancy reduction to two major graph analytics scenarios, involving static (fixed) graphs and evolving (changing) graphs, and achieves substantial performance improvements. | 
  
    |  | 
  
    | Rajiv is a Distinguished Professor and the Amrik Singh Poonian Chair in Computer Science at the University of California, Riverside. His research interests include compilers, architectures, and runtimes for parallel systems. He has co-authored over 300 papers, with more than 16,000 citations and an h-index of 69. He has supervised 42 Ph.D. dissertations, including those of two ACM SIGPLAN Outstanding Doctoral Dissertation Award winners. Rajiv is a Fellow of the IEEE, ACM, and AAAS. He received the NSF Presidential Young Investigator Award and the UCR Doctoral Dissertation Advisor/Mentor Award. He has chaired several major ACM/IEEE conferences, including FCRC, PLDI, HPCA, ASPLOS, PPoPP, and PACT. Rajiv also served as a member of a technical advisory group on networking and information technology established by the U.S. President’s Council of Advisors on Science and Technology. 
 |