Sr. Silicon Design Engineer · AMD

Kaustubh
Shivdikar

Hardware architect at AMD designing next-generation GPU microarchitecture. Ph.D. in Computer Engineering from Northeastern University, with research spanning GNN acceleration, homomorphic encryption on GPUs, and hardware-software co-design for emerging workloads.

Kaustubh Shivdikar
Cambridge, MA
01 — News

Latest

Award
January 2025
GME named Top Pick in Hardware & Embedded Security 2024
Our MICRO 2023 paper “GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption” was selected as a Top Pick in Hardware and Embedded Security for 2024, with an invitation to submit to the journal special issue.
March 2025
PIMnet accepted at HPCA 2025
“PIMnet: A Domain-Specific Network for Efficient Collective Communication in Scalable PIM” accepted at the 31st International Symposium on High Performance Computer Architecture (HPCA), Las Vegas.
02 — Research

Areas of Focus

I

GPU Microarchitecture

Designing next-generation GPU compute architectures at AMD. Hardware-software co-design for CDNA/RDNA series targeting ML inference, training, and HPC workloads.

II

Graph Neural Networks

NeuraChip — a spatial accelerator using Gustavson’s algorithm with decoupled multiply-add and hash-based load balancing. 22x speedup over vendor sparse libraries.

III

Homomorphic Encryption

GME — GPU-based microarchitectural extensions for FHE. Polynomial multiplication optimization and Barrett reduction on CDNA GPUs. Top Pick in HES 2024.

IV

State Space Models

Characterizing Mamba-based SSM training on GPUs, building workload suites that span model architectures to guide hardware optimization decisions.

V

DNN Security

Reverse engineering DNN architectures by exploiting JIT GEMM code caches to extract CNN model hyperparameters — exposing side-channel vulnerabilities in ML inference.

VI

Processing-in-Memory

Evaluating real PIM architectures using UPMEM systems. PIMnet — domain-specific interconnection for efficient collective communication in scalable PIM.

03 — Publications

Selected Works

HPCA 2025
PIMnet: A Domain-Specific Network for Efficient Collective Communication in Scalable PIM
H. Son, G. Jonatan, X. Wu, H. Cho, K. Shivdikar, J.L. Abellán, A. Joshi, D. Kaeli, J. Kim
ISCA 2024
NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator
K. Shivdikar, N.B. Agostini, M. Jayaweera, G. Jonatan, J.L. Abellán, A. Joshi, J. Kim, D. Kaeli
SIGMETRICS’24
Scalability Limitations of Processing-in-Memory using Real System Evaluations
G. Jonatan, H. Cho, H. Son, X. Wu, N. Livesay, E. Mora, K. Shivdikar, J.L. Abellán, A. Joshi, D. Kaeli, J. Kim
ASPLOS 2024
MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training
H. Peng, X. Xie, K. Shivdikar, A. Hasan, J. Zhao, S. Huang, O. Khan, D. Kaeli, C. Ding
MICRO 2023
GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption
K. Shivdikar, Y. Bao, R. Agrawal, M. Shen, G. Jonatan, E. Mora, A. Ingare, N. Livesay, J.L. Abellán, J. Kim, A. Joshi, D. Kaeli
♦ Top Pick in HES 2024
IEEE Micro’23
Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs
N. Livesay, G. Jonatan, E. Mora, K. Shivdikar, R. Agrawal, A. Joshi, J.L. Abellán, J. Kim, D. Kaeli
SEED 2022
Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs
K. Shivdikar, G. Jonatan, E. Mora, N. Livesay, R. Agrawal, A. Joshi, J.L. Abellán, J. Kim, D. Kaeli
SEED 2021
JAXED: Reverse Engineering DNN Architectures Leveraging JIT GEMM Libraries
M. Jayaweera, K. Shivdikar, Y. Wang, D. Kaeli
ISPASS 2021
GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUs
T. Baruah, K. Shivdikar, S. Dong, Y. Sun, S.A. Mojumder, K. Jung, J.L. Abellán, Y. Ukidave, A. Joshi, J. Kim, D. Kaeli
SC 2019
Reproducing Performance of a Multi-Physics Simulations of the Tsunamigenic 2004 Sumatra Megathrust Earthquake
C. Bunn, H. Barclay, A. Lazarev, F. Yusuf, J. Fitch, J. Booth, K. Shivdikar, D. Kaeli
View all publications on Google Scholar →
04 — Experience

Career

2024 — Present
Sr. Silicon Design Engineer
GFX Architecture & IP Development
GPU microarchitecture design and hardware-software co-optimization for next-generation AMD GPUs. CDNA/RDNA compute architecture, ML workload characterization, and accelerator design.
2018 — 2024
Graduate Research Assistant
NUCAR — Northeastern University
Ph.D. research under Prof. David Kaeli. Designed NeuraChip (CGRA accelerator for GNNs), extended AMD CDNA microarchitecture for FHE (GME), and built cycle-accurate multi-threaded architecture simulators.
2019 — 2020
Research Co-op — Parallel Computing Lab
Intel Labs
Two rotations with Fabrizio Petrini’s team working on high-performance computing systems and parallel architecture research.
2018
GPU Research Co-op
Omron Adept Technologies
Designed massively parallel graph traversal algorithms using BFS for robot path planning.

Education

Ph.D.
Computer Engineering
Northeastern University
2024
M.S.
Electrical & Computer Engineering
Northeastern University
2021
B.Tech.
Electrical Engineering
VJTI, University of Mumbai
2016
05 — Expertise

Technical Skills

ML & AI

  • DNN Acceleration
  • GNN Training
  • State Space Models
  • FHE on LLMs
  • PyTorch / TensorFlow

Architecture

  • GPU Microarchitecture
  • CGRA Design
  • Sparse Accelerators
  • On-chip Networks
  • PIM Systems

Simulators

  • NeuraSim (CGRA)
  • NaviSim (AMD GPU)
  • Sniper (x86)
  • SST (Accelerators)
  • FHESim (FPGA)

Languages

  • C / C++
  • CUDA / ROCm
  • OpenCL
  • Python
  • SEAL / OpenFHE
06 — Recognition

Awards & Service

2024
Top Pick in Hardware & Embedded Security
GME paper crowned as Top Pick in HES 2024, with invitation to special journal issue.
2023
MICRO Student Travel Grant
Awarded to present GME at MICRO 2023 in Toronto, Canada.
2019
Best Poster Award — RISE Expo
Awarded for The Prime Hexagon at Northeastern’s Research, Innovation, Scholarship, and Entrepreneurial Expo.
2018
Graduate Innovator Award — RISE Expo
Pi-Tiles: Distributed matrix-multiplication over tiled data using 64 Raspberry Pis.
2023
HPCA Submission Chair
Developed paper-to-reviewer matching algorithm for IEEE HPCA 2023 conference submissions.
2020
GPU Programming Instructor
Taught the CUDA parallel programming course for undergraduate and graduate students at Northeastern.
2022
REU Pathways Mentor
Mentored three students through research projects on parallel processing in graph computing.
2017–18
Student Cluster Contest Mentor
Mentored Northeastern’s team for the Student Cluster Competition at SC’17 and SC’18.