PACT 2025November 3-6, 2025

Schedule

Monday, November 3

Time What
08:30

12:00
Tutorial: CEDR: A Holistic Software and Hardware Design Environment for Hardware Agnostic Application Development and Deployment on FPGA-Integrated Heterogeneous Systems
Serhan Gener, Sahil Hassan, Ali Akoglu (Department of Electrical and Computer Engineering, University of Arizona)
12:00 Lunch
13:00

18:00
Tutorial: SODA Synthesizer: Accelerating Artificial Intelligence Applications with an End-to-End Silicon Compiler
Bohm Agostini (PNNL), Vito Giovanni Castellana (PNNL), Fabrizio Ferrandi (Politecnico di Milano), Giovanni Gozzi (Politecnico di Milano), Ankur Limaye (PNNL), Antonino Tumeo (PNNL)

Tuesday, November 4

Time What
08:00 Opening
08:15 Keynote: AI's memory Challenges , Jae W. Lee, Seoul National University.
09:15 Break
09:45 Session 1: LLM Systems at Scale
  • Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits
    Dowon Kim (Hanyang University), Minjae Lee (Hanyang University), Janghyeon Kim (Hanyang University), HyuckSung Kwon (Hanyang University), Hyeonggyu Jeong (Hanyang University), Sang-Soo Park (Samsung Electronics), Minyong Yoon (Samsung Electronics), Si-Dong Roh (Samsung Electronics), Jinin So (Samsung Electronics), Jungwook Choi (Hanyang University), and Yongsuk Kwon (Samsung Electronics)
  • SPipe: Hybrid GPU and CPU Pipeline for Training LLMs under Memory Pressure
    Junyeol Ryu (Seoul National University), Yujin Jeong (Seoul National University), Daeyoung Park (Seoul National University), Jinpyo Kim (Seoul National University), Heehoon Kim (Seoul National University), and Jaejin Lee (Seoul National University)
  • ScaleMoE: A Fast and Scalable Distributed Training Framework for Large-Scale Mixture-of-Experts Models
    Seohong Choi (Sungkyunkwan University), Huize Hong (Sungkyunkwan University), Tae Hee Han (Sungkyunkwan University), and Joonsung Kim (Sungkyunkwan University)
  • LibraPIM: Dynamic Load Rebalancing to Maximize Utilization in PIM-Assisted LLM Inference Systems
    Hyeongjun Cho (Sungkyunkwan University), Yoonho Jang (Sungkyunkwan University), Hyungi Kim (Sungkyunkwan University), Seongwook Kim (Sungkyunkwan University), Keewon Kwon (Sungkyunkwan University), Gwangsun Kim (POSTECH), and Seokin Hong (Sungkyunkwan University)
  • Doppeladler: Adaptive Tensor Parallelism for Latency-Critical LLM Deployment on CPU-GPU Integrated End-User Device
    Jiazhi Jiang (Sun Yat-sen University), Xiao Liu (Sun Yat-sen University), Jiangsu Du (Sun Yat-sen University), Dan Huang (Sun Yat-sen University), and Yutong Lu (Sun Yat-sen University)
12:00 Lunch
13:15 Session 2: Memory Systems & Caching
  • Exploring Memory Tiering Systems in the CXL Era via FPGA-based Emulation and Device-Side Management
    Yiqi Chen (Peking University), Xiping Dong (Peking University), Zhe Zhou (Peking University), Zhao Wang (Peking University), Jie Zhang (Peking University), and Guangyu Sun (Peking University)
  • CPC: Coordinated Page Cache for Serverless Computing
    Keun Soo Lim (Seoul National University), Yunjay Hong (Seoul National University), Jongheon Jeong (Seoul National University), Sam Son (UC Berkeley), Donguk Kim (Seoul National University), Yeonhong Park (Seoul National University), Jae W. Lee (Seoul National University), and Jinkyu Jeong (Yonsei University)
  • SCREME: A Scalable Framework for Resilient Memory Design
    Fan Li (University of Central Florida), Mimi Xie (University of Texas at San Antonio), Yanan Guo (University of Rochester), Huize Li (University of Central Florida), and Xin Xin (University of Central Florida)
  • Cache Miss Curve Analysis via Cardinality Domain
    Eishi Arima (Technical University of Munich) and Martin Schulz (Technical University of Munich)
  • EARTH: Efficient Architecture for RISC-V Vector Memory Access
    Hongyi Guan (Tsinghua University), Yichuan Gao (Intel Labs China), Chenlu Miao (Intel Labs China), Haoyang Wu (Intel Labs China), Hang Zhu (Independent Researcher), Mingfeng Lin (Shenzhen University), and Huayue Liang (Intel Labs China)
15:30 Break
16:00 Session 3: GPU Algorithms for Irregular Workloads & OLAP
  • ANG: Accelerating NFA processing on GPUs via Exploring Multi-Level Fine-Grained Parallelism
    Yuguang Wang (Michigan Technological University), Yunmo Zhang (City University of Hong Kong), Zeyu Liu (City University of Hong Kong), Junqiao Qiu (City University of Hong Kong), and Zhenlin Wang (Michigan Tech)
  • Accelerating DFS-based Subgraph Matching on GPU via Reusing Intersection
    Chen Chen (National University of Defense Technology), Shanzhi Gu (National University of Defense Technology), Junsheng Chang (National University of Defense Technology), and Li Shen (National University of Defense Technology)
  • Multiway Merge Partitioning for Sparse-Sparse Matrix Multiplication on GPUs
    Eric Lorimer (Georgia Institute of Technology), Ruobing Han (Georgia Institute of Technology), Sung Ha Kang (Georgia Institute of Technology), and Hyesoon Kim (Georgia Institute of Technology)
  • DMO-DB: Mitigating the Data Movement Bottlenecks of GPU-Accelerated Relational OLAP
    Chaemin Lim (Yonsei University), Suhyun Lee (Yonsei University), Jinwoo Choi (Yonsei University), Joonsung Kim (Sungkyunkwan University), Jinho Lee (Seoul National University), and Youngsok Kim (Yonsei University)
18:00 Reception and Posters

Wednesday, November 5

Time What
08:15 Keynote: Efficient Big Graph Analytics Via Redundancy Reduction , Rajiv Gupta, University of California, Riverside.
09:15 Break
09:45 Session 4: Compilers & Program Generation
  • LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers
    Massinissa Merouani (New York University Abu Dhabi), Afif Boudaoud (New York University Abu Dhabi), Iheb Nassim Aouadj (New York University Abu Dhabi), Nassim Tchoulak (Ecole Nationale Supérieure d'Informatique), Islem Kara Bernou (Ecole Nationale Supérieure d'Informatique), Hamza Benyamina (New York University Abu Dhabi), Fatima Benbouzid-Si Tayeb (École nationale supérieure d'informatique), Karima Benatchba (Ecole Nationale Supérieure d'Informatique), Hugh Leather (Meta), and Riyadh Baghdadi (New York University Abu Dhabi)
  • Agentic Auto-Scheduling: An Experimental Study of LLM-Based Loop Optimization
    Massinissa Merouani (New York University Abu Dhabi), Islem Kara Bernou (New York University Abu Dhabi), and Riyadh Baghdadi (New York University Abu Dhabi)
  • Guess, Measure & Edit: Using Lowering to Lift Tensor Code
    José Wesley De Souza Magalhães (Universtiy of Edinburgh), Jackson Woodruff (Universtiy of Edinburgh), Jordi Armengol-Estapé (Universtiy of Edinburgh), Alexander Brauckmann (Universtiy of Edinburgh), Luc Jaulmes (Universtiy of Edinburgh), Elizabeth Polgreen (Universtiy of Edinburgh), and Michael O'Boyle (University of Edinburgh)
  • Automatic Generation of Actor-based Parallelism from Shared Memory Parallel Programs
    Jun Shirako (Georgia Institute of Technology) and Vivek Sarkar (Georgia Institute of Technology)
  • Automatic Code-Generation for Accelerating Structured-Mesh-Based Explicit Numerical Solvers on FPGAs
    Beniel Thileepan (Department of Computer Science, University of Warwick), Suhaib A Fahmy (King Abdullah University of Science and Technology (KAUST)), and Gihan R Mudalige (Department of Computer Science, University of Warwick)
12:00 Lunch / Business Meeting
13:15 Session 5: Specialized Accelerators
  • FLASH: An Abstract Machine for Modeling Fully Homomorphic Encryption Accelerators
    Alireza Tabatabaeian (Simon Fraser University) and Arrvindh Shriraman (Simon Fraser University)
  • Energy-Efficient Acceleration of Hash-Based Post-Quantum Cryptographic Schemes on Embedded Spatial Architectures
    Yanze Wu (George Mason University) and Md Tanvir Arafin (George Mason University)
  • Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration
    Robin Geens (MICAS (KU Leuven)), Arne Symons (MICAS (KU Leuven)), and Marian Verhelst (MICAS (KU Leuven))
  • Squire: A General-Purpose Accelerator to Exploit Fine-Grain Parallelism on Dependency-Bound Kernels
    Rubén Langarita (Barcelona Supercomputing Center), Pablo Ibáñez-Marín (Universidad de Zaragoza), Jesús Alastruey-Benedé (Universidad de Zaragoza), Miquel Moreto (UPC/BSC), Santiago Marco-Sola (Universitat Politècnica de Catalunya - Barcelona Supercomputing Center), and Adrià Armejach (Barcelona Supercomputing Center)
  • Bancroft: Genomics Acceleration Beyond On-Device Memory
    Se-Min Lim (University of California, Irvine), Seongyoung Kang (University of California, Irvine), and Sang-Woo Jun (University of California, Irvine)
15:30 Break
16:00 Session 6: Edge & Mobile AI Systems
  • Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
    Yujeong Choi (Google), John Kim (KAIST), and Minsoo Rhu (KAIST)
  • Salient Store: Enabling Smart Storage for Continuous Learning Edge Servers
    Cyan Subhra Mishra (The Pennsylvania State University), Deeksha Chaudhary (The Pennsylvania State University), Mahmut T Kandemir (The Pennsylvania State University), and Chita Das (The Pennsylvania State University)
  • Bit-Level Semantics: Scalable RAG Retrieval with Neurosymbolic Hyperdimensional Computing
    Hyunsei Lee (DGIST), Shinhyoung Jang (DGIST), Jaewoo Gwak (DGIST), Jongho Park (DGIST), and Yeseong Kim (DGIST)
  • Optimizing 3D Gaussian Splattering for Mobile GPUs
    Md Musfiqur Rahman Sanim (University of Georgia), Zhihao Shu (University of Georgia), Bahram Afsharmanesh (University of Georgia), AmirAli Mirian (University of Georgia), Jiexiong Guan (William & Mary), Wei Niu (University of Georgia), Bin Ren (William & Mary), and Gagan Agrawal (University of Georgia)

Thursday, November 6

Time What
08:15 Session 7: Communication, Profiling & Mapping across CPU–GPU Clusters
  • Generating Two-Level, GPU-Aware Mappings for Distributed Tensor Computations
    Botao Wu (The Ohio State University) and Martin Kong (The Ohio State University)
  • GPU Stream-Aware Communication for Effective Pipelining
    Naveen Namashivayam (University of Minnesota at Twin Cities), Krishna Kandalla (Hewlett Packard Enterprise), Pen-Chung Yew (University of Minnesota at Twin Cities), James B White III (Hewlett Packard Enterprise), Larry Kaplan (Hewlett Packard Enterprise), and Mark Pagel (Hewlett Packard Enterprise)
  • TPE: XPU-Point: Simulator-Agnostic Sample Selection Methodology for Heterogeneous CPU-GPU Applications
    Alen Sabu (Arm), Harish Patil (Intel), Wim Heirman (Intel), Changxi Liu (National University of Singapore), and Trevor E. Carlson (National University of Singapore)
09:30 Break
09:45 SRC Poster Presentations
10:45 Session 8: Novel Parallel Architectures & Runtime Mechanisms
  • A Stable Marriage Requires a Shared Residence with Low Contention and Mutual Complementarity
    Jiaxin Liu (The Ohio State University), Rubao Lee (Freelance), Cathy Xia (The Ohio State University), and Xiaodong Zhang (The Ohio State University)
  • Optimize Winograd Convolution for a Novel MIMD Many-core Architecture PEZY-SC3s
    Yi Zhou (National University of Defense Technology), Qinglin Wang (National University of Defense Technology), Lian Wang (Shanxi Supercomputing Center), Zhiyan Liu (Shanxi Supercomputing Center), Bingwei Wang (Shanxi Supercomputing Center), Feiming Liu (Shanxi Supercomputing Center), Xiangdong Pei (Shanxi Supercomputing Center), and Jie Liu (National University of Defense Technology)
  • CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations
    Zhuolun Jiang (Institute of Computing Technology, Chinese Academy of Sciences), Songyue Wang (Institute of Computing Technology, Chinese Academy of Sciences), Xiaokun Pei (Institute of Computing Technology, Chinese Academy of Sciences), Tianyue Lu (Institute of Computing Technology, Chinese Academy of Sciences), and Mingyu Chen (Institute of Computing Technology, Chinese Academy of Sciences)
12:00 Closing




Keynotes

Tuesday, November 4: AI's Memory Challenges

Jae W. Lee, Seoul National University
 
Modern AI systems face a fundamental shift: memory and storage access, not computation, are becoming the primary performance bottleneck. Transformer architectures with O(L²) self-attention complexity struggle with bandwidth constraints, while agentic AI’s expanding context demands, with explicit token-space representation (KV cache), intensify capacity pressure. Memory reliability is also becoming a more critical concern, especially in the context of long-running training. This talk explores the full spectrum of challenges reshaping AI platforms and several avenues to turn these bottlenecks into breakthroughs. I’ll discuss recent solutions spanning hardware–software co-design, bandwidth-efficient algorithms, Processing-in-Memory/Near-Data Processing, and NAND flash’s evolution into specialized "AI memory" for KV caches and vector databases. This talk will explore how memory-centric design will define scalable AI infrastructure — and what opportunities lie ahead for systems researchers and architects.
 

    

Jae W. Lee is a Professor of Computer Science and Engineering and the Director of the AI Institute at Seoul National University (SNU). His research focuses on computer architecture, systems, parallel programming, and hardware security, with recent emphasis on memory-centric hardware/software co-design for AI. His work has been recognized with various awards and honors, including the IEEE Symposium on VLSI Circuits Most Frequently Cited Paper Award in 30 Years (2017), the ACM ASPLOS Most Influential Paper Award (2014), two Google Research Awards (2024, 2025), the ISCA Hall of Fame (2021), the ASPLOS Hall of Fame (2021), two IEEE Micro "Top Picks" selections (2020), the ACM ASPLOS "Highlights" Paper (2017), the HiPEAC Paper Award at ACM PLDI (2012), and the IEEE PACT Top Paper (2010). He has served as Program Co-Chair of the IEEE Micro "Top Picks" special issue (2023) and the ACM International Symposium on Memory Management (ISMM) (2024), General Chair of the International Symposium on Code Generation and Optimization (CGO) (2021, 2022), and as a PC member of numerous top-tier computer architecture and systems conferences, such as ISCA, ASPLOS, MICRO, MLSys, HPCA, SC, and Hot Chips. He spent a year as a visiting faculty researcher at Google DeepMind in Mountain View, CA, in 2022–2023. Before joining SNU, he was a research associate at Princeton University and a researcher and engineer at Parakinetics, Inc., where he conducted research on multicore software optimization. He also contributed to multiple successful VLSI implementation projects, including the Physical Uncloneable Function (PUF) and RAW Microprocessor at MIT. He received his B.S. in EE from SNU, his M.S. in EE from Stanford, and his Ph.D. in CS from MIT.
 
 
 

Wednesday, November 5: Efficient Big Graph Analytics Via Redundancy Reduction

Rajiv Gupta, University of California, Riverside
 
Analyses on large graphs are an increasingly important computational workload, as graph analytics is employed in many domains. Therefore, a significant amount of research in this area has focused on developing frameworks that leverage the parallelism available on various hardware platforms, ranging from a single GPU or multicore server to a cluster of servers and/or GPUs. In this talk, I will describe our work, which combines parallelism with a complementary approach that comprehensively reduces redundancy to improve scalability. Redundancy can be found and removed not only from the computation and propagation of values, but also from graph traversal and data transfer across the memory hierarchy. Our work applies redundancy reduction to two major graph analytics scenarios, involving static (fixed) graphs and evolving (changing) graphs, and achieves substantial performance improvements.
 

    

Rajiv is a Distinguished Professor and the Amrik Singh Poonian Chair in Computer Science at the University of California, Riverside. His research interests include compilers, architectures, and runtimes for parallel systems. He has co-authored over 300 papers, with more than 16,000 citations and an h-index of 69. He has supervised 42 Ph.D. dissertations, including those of two ACM SIGPLAN Outstanding Doctoral Dissertation Award winners. Rajiv is a Fellow of the IEEE, ACM, and AAAS. He received the NSF Presidential Young Investigator Award and the UCR Doctoral Dissertation Advisor/Mentor Award. He has chaired several major ACM/IEEE conferences, including FCRC, PLDI, HPCA, ASPLOS, PPoPP, and PACT. Rajiv also served as a member of a technical advisory group on networking and information technology established by the U.S. President’s Council of Advisors on Science and Technology.

Important Dates and Deadlines

Conference Registration:

  • Early registration deadline: September 23, 2025

Conference Papers: (extended submission deadlines!)

ACM SRC:

  • Abstract Submission Deadline: September 12, 2025 AoE (UTC-12)
  • Author Notification: September 21, 2025

Conference: November 3-6, 2025


Sponsors

Bronze

Supporters


Previous PACTs

Earlier PACTs