Accepted Papers
Note: some papers are still undergoing a shepherding process
-
A Stable Marriage Requires a Shared Residence with Low Contention and Mutual Complementarity
JIAXIN LIU (The Ohio State University), RUBAO LEE (Freelance), CATHY XIA (The Ohio State University), XIAODONG ZHANG (The Ohio State University)
-
Accelerating DFS-based Subgraph Matching on GPU via Reusing Intersection
Chen Chen (National University of Defense Technology), Shanzhi Gu (National University of Defense Technology), Junsheng Chang (National University of Defense Technology), Li Shen (National University of Defense Technology)
-
ANG: Accelerating NFA processing on GPUs via Exploring Multi-Level Fine-Grained Parallelism
Yuguang Wang (Michigan Technological University), Yunmo Zhang (City University of Hong Kong), Zeyu Liu (City University of Hong Kong), Junqiao Qiu (City University of Hong Kong), Zhenlin Wang (Michigan Tech)
-
Automatic Code-Generation for Accelerating Structured-Mesh-Based Explicit Numerical Solvers on FPGAs
Beniel Thileepan (Department of Computer Science, University of Warwick), Suhaib A Fahmy (King Abdullah University of Science and Technology (KAUST)), Gihan R Mudalige (Department of Computer Science, University of Warwick)
-
Automatic Generation of Actor-based Parallelism from Shared Memory Parallel Programs
Jun Shirako (Georgia Institute of Technology), Vivek Sarkar (Georgia Institute of Technology)
-
Bancroft: Genomics Acceleration Beyond On-Device Memory
Se-Min Lim (University of California, Irvine), Seongyoung Kang (University of California, Irvine), Sang-Woo Jun (University of California, Irvine)
-
Bit-Level Semantics: Scalable RAG Retrieval with Neurosymbolic Hyperdimensional Computing
Hyunsei Lee (DGIST), Shinhyoung Jang (DGIST), Jaewoo Gwak (DGIST), Jongho Park (DGIST), Yeseong Kim (DGIST)
-
Cache Miss Curve Analysis via Cardinality Domain
Eishi Arima (Technical University of Munich), Martin Schulz (Technical University of Munich)
-
Conversational Auto-Scheduling: An Experimental Study of LLM-Based Loop Optimization
Massinissa Merouani (New York University Abu Dhabi), Islem Kara Bernou (New York University Abu Dhabi), Riyadh Baghdadi (New York University Abu Dhabi)
-
CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations
Zhuolun Jiang (Institute of Computing Technology, Chinese Academy of Sciences), Songyue Wang (Institute of Computing Technology, Chinese Academy of Sciences), Xiaokun Pei (Institute of Computing Technology, Chinese Academy of Sciences), Tianyue Lu (Institute of Computing Technology, Chinese Academy of Sciences), Mingyu Chen (Institute of Computing Technology, Chinese Academy of Sciences)
-
CPC: Coordinated Page Cache for Serverless Computing
Keun Soo Lim (Seoul National University), Yunjay Hong (Seoul National University), Jongheon Jeong (Seoul National University), Sam Son (UC Berkeley), Donguk Kim (Seoul National University), Yeonhong Park (Seoul National University), Jae W. Lee (Seoul National University), Jinkyu Jeong (Yonsei University)
-
DMO-DB: Mitigating the Data Movement Bottlenecks of GPU-Accelerated Relational OLAP
Chaemin Lim (Yonsei University), Suhyun Lee (Yonsei University), Jinwoo Choi (Yonsei University), Joonsung Kim (Sungkyunkwan University), Jinho Lee (Seoul National University), Youngsok Kim (Yonsei University)
-
Doppeladler: Adaptive Tensor Parallelism for Latency-Critical LLM Deployment on CPU-GPU Integrated End-User Device
Jiazhi Jiang (Sun Yat-sen University), Xiao Liu (Sun Yat-sen University), Jiangsu Du (Sun Yat-sen University), Dan Huang (Sun Yat-sen University), Yutong Lu (Sun Yat-sen University)
-
EARTH: Efficient Architecture for RISC-V Vector Memory Access
Hongyi Guan (Tsinghua University), Yichuan Gao (Intel Labs China), Chenlu Miao (Intel Labs China), Haoyang Wu (Intel Labs China), Hang Zhu (Independent Researcher), Mingfeng Lin (Shenzhen University), Huayue Liang (Intel Labs China)
-
Energy-Efficient Acceleration of Hash-Based Post-Quantum Cryptographic Schemes on Embedded Spatial Architectures
Yanze Wu (George Mason University), Md Tanvir Arafin (George Mason University)
-
Exploring Memory Tiering Systems in the CXL Era via FPGA-based Emulation and Device-Side Management
Yiqi Chen (Peking University), Xiping Dong (Peking University), Zhe Zhou (Peking University), Zhao Wang (Peking University), Jie Zhang (Peking University), Guangyu Sun (Peking University)
-
Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration
Robin Geens (MICAS (KU Leuven)), Arne Symons (MICAS (KU Leuven)), Marian Verhelst (MICAS (KU Leuven))
-
FLASH: An Abstract Machine for Modeling Fully Homomorphic Encryption Accelerators
Alireza Tabatabaeian (Simon Fraser University), Arrvindh Shriraman (Simon Fraser University)
-
Generating Two-Level, GPU-Aware Mappings for Distributed Tensor Computations
Botao Wu (The Ohio State University), Martin Kong (The Ohio State University)
-
GPU Stream-Aware Communication for Effective Pipelining
Naveen Namashivayam (University of Minnesota at Twin Cities), Krishna Kandalla (Hewlett Packard Enterprise), Pen-Chung Yew (University of Minnesota at Twin Cities), James B White III (Hewlett Packard Enterprise), Larry Kaplan (Hewlett Packard Enterprise), Mark Pagel (Hewlett Packard Enterprise)
-
Guess, Measure & Edit: Using Lowering to Lift Tensor Code
José Wesley De Souza Magalhães (Universtiy of Edinburgh), Jackson Woodruff (Universtiy of Edinburgh), Jordi Armengol-Estapé (Universtiy of Edinburgh), Alexander Brauckmann (Universtiy of Edinburgh), Luc Jaulmes (Universtiy of Edinburgh), Elizabeth Polgreen (Universtiy of Edinburgh), Michael O'Boyle (University of Edinburgh)
-
Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
Yujeong Choi (Google), John Kim (KAIST), Minsoo Rhu (KAIST)
-
LibraPIM: Dynamic Load Rebalancing to Maximize Utilization in PIM-Assisted LLM Inference Systems
Hyeongjun Cho (Sungkyunkwan University), Yoonho Jang (Sungkyunkwan University), Hyungi Kim (Sungkyunkwan University), Seongwook Kim (Sungkyunkwan University), Keewon Kwon (Sungkyunkwan University), Gwangsun Kim (POSTECH), Seokin Hong (Sungkyunkwan University)
-
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers
Massinissa Merouani (New York University Abu Dhabi), Khaled Afif Boudaoud (None), Iheb Nassim Aouadj (New York University Abu Dhabi), Nassim Tchoulak (Ecole Nationale Supérieure d'Informatique), Islem Kara Bernou (Ecole Nationale Supérieure d'Informatique), Hamza Benyamina (New York University Abu Dhabi), Fatima Benbouzid-Si Tayeb (École nationale supérieure d'informatique), Karima Benatchba (Ecole Nationale Supérieure d'Informatique), Hugh Leather (Meta), Riyadh Baghdadi (New York University Abu Dhabi)
-
Multiway Merge Partitioning for Sparse-Sparse Matrix Multiplication on GPUs
Eric Lorimer (Georgia Institute of Technology), Ruobing Han (Georgia Institute of Technology), Sung Ha Kang (Georgia Institute of Technology), Hyesoon Kim (Georgia Institute of Technology)
-
Optimize Winograd Convolution for a Novel MIMD Many-core Architecture PEZY-SC3s
Yi Zhou (National University of Defense Technology), Qinglin Wang (National University of Defense Technology), Lian Wang (Shanxi Supercomputing Center), Zhiyan Liu (Shanxi Supercomputing Center), Bingwei Wang (Shanxi Supercomputing Center), Feiming Liu (Shanxi Supercomputing Center), Xiangdong Pei (Shanxi Supercomputing Center), Jie Liu (National University Of Defense Technology)
-
Optimizing 3D Gaussian Splattering for Mobile GPUs
Md Musfiqur Rahman Sanim (University of Georgia), Zhihao Shu (University of Georgia), Bahram Afsharmanesh (University of Georgia), AmirAli Mirian (University of Georgia), Jiexiong Guan (William & Mary), Wei Niu (University of Georgia), Bin Ren (William & Mary), Gagan Agrawal (University of Georgia)
-
Salient Store: Enabling Smart Storage for Continuous Learning Edge Servers
Cyan Subhra Mishra (The Pennsylvania State University), Deeksha Chaudhary (The Pennsylvania State University), Mahmut T Kandemir (The Pennsylvania State University), Chita Das (The Pennsylvania State University)
-
Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits
Dowon Kim (Hanyang University), Minjae Lee (Hanyang University), Janghyeon Kim (Hanyang University), HyuckSung KWON (Hanyang University), Hyeonggyu Jeong (Hanyang University), Sang-Soo Park (Samsung Electronics), Minyong Yoon (Samsung Electronics), Si-Dong Roh (Samsung Electronics), Jinin So (Samsung Electronics), Jungwook Choi (Hanyang University)
-
ScaleMoE: A Fast and Scalable Distributed Training Framework for Large-Scale Mixture-of-Experts Models
Seohong Choi (Sungkyunkwan University), Huize Hong (Sungkyunkwan University), Taehee Han (Sungkyunkwan University), Joonsung Kim (Sungkyunkwan University)
-
SCREME: A Scalable Framework for Resilient Memory Design
Fan Li (University of Central Florida), Mimi Xie (University of Texas at San Antonio), Yanan Guo (University of Rochester), Huize Li (University of Central Florida), Xin Xin (University of Central Florida)
-
SPipe: Hybrid GPU and CPU Pipeline for Training LLMs under Memory Pressure
Junyeol Ryu (Seoul National University), Yujin Jeong (Seoul National University), Daeyoung Park (Seoul National University), Jinpyo Kim (Seoul National University), Heehoon Kim (Seoul National University), Jaejin Lee (Seoul National University)
-
Squire: A General-Purpose Accelerator to Exploit Fine-Grain Parallelism on Dependency-Bound Kernels
Rubén Langarita (Barcelona Supercomputing Center), Pablo Ibáñez-Marín (Universidad de Zaragoza), Jesús Alastruey-Benedé (Universidad de Zaragoza), Miquel Moreto (UPC/BSC), Santiago Marco-Sola (Universitat Politècnica de Catalunya - Barcelona Supercomputing Center), Adrià Armejach (Barcelona Supercomputing Center)
-
TPE: XPE-Point: Simulator-Agnostic Sample Selection Methodology for Heterogeneous CPU-GPU Applications
Alen Sabu (Arm), Harish Patil (Intel), Wim Heirman (Intel), Changxi Liu (National University of Singapore), Trevor E. Carlson (National University of Singapore)
Important Dates and Deadlines
Conference Registration:
-
Early registration deadline:
September 23, 2025
Conference Papers: (extended submission deadlines!)
ACM SRC:
-
Abstract Submission Deadline:
September 12, 2025 AoE (UTC-12)
-
Author Notification:
September 21, 2025
Conference: November 3-6, 2025
Previous PACTs
Earlier PACTs