Kaustubh Shivdikar
-
- Last edited 207 days ago by Kaustubh Shivdikar
-
I am a Ph.D. candidate studying in NUCAR lab at Northeastern University under the guidance of Dr. David Kaeli. My research focuses on designing hardware accelerators for sparse graph workloads.
My expertise lies in:
- Computer Architecture Simulator Design
- Graph Neural Network Accelerators
- Sparse Matrix Accelerators
- Homomorphic Encryption Accelerators
- GPU Kernel Design
Contact: shivdikar.k [at] northeastern [dot] edu, mail [at] kaustubh [dot] us
Contents
Education
- Ph.D., Compuer Engineering, Northeastern University, Boston [Expected Fall 2022]
- M.S., Electrical and Computer Engineering, Northeastern University, Boston [May 2021]
- B.S., Electrical Engineering, Veermata Jijabai Technological Institute, University of Mumbai, India [May 2016]
Work
- Summer-Fall 2020 Coop: Parallel Computing Lab (Fabrizio Petrini), Intel, developed novel architectural features.
- Summer-Fall 2019 Coop: Parallel Computing Lab (Fabrizio Petrini), Intel, designed SpGEMM kernels for Intel’s new architecture.
- Summer-Fall 2018 Coop: Omron Adept (George Paul), implemented parallel graph traversal algorithms for Robot Path Planning.
Recent News
- May 2022: Served Submission co-chair for HPCA 2022 conference.
- April 2019: Graduate Innovator Award at the RISE 2019 Research Expo for our poster Pi-Tiles
- April 2018: Best Poster Award at the RISE 2018 Research Expo for our poster The Prime Hexagon
- November 2018: Mentored the NEU team for Student Cluster Contest at Super Computing Conference 2018
- November 2017: Joined the NEU Team for Student Cluster Contest at Super Computing Conference 2017
Publications
- Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs
Abstract |
---|
Fully Homomorphic Encryption (FHE) enables users to securely outsource both the storage and computation of sensitive data to untrusted servers. Not only does FHE offer an attractive solution for security in cloud systems, but lattice-based FHE systems are also believed to be resistant to attacks by quantum computers. However, current FHE implementations suffer from prohibitively high latency. For lattice-based FHE to become viable for real-world systems, it is necessary for the key bottlenecks---particularly polynomial multiplication---to be highly efficient.
In this paper, we present a characterization of GPU-based implementations of polynomial multiplication. We begin with a survey of modular reduction techniques and analyze several variants of the widely-used Barrett modular reduction algorithm. We then propose a modular reduction variant optimized for 64-bit integer words on the GPU, obtaining a 1.8x speedup over the existing comparable implementations. Next, we explore the following GPU-specific improvements for polynomial multiplication targeted at optimizing latency and throughput: 1) We present a 2D mixed-radix, multi-block implementation of NTT that results in a 1.85x average speedup over the previous state-of-the-art. 2) We explore shared memory optimizations aimed at reducing redundant memory accesses, further improving speedups by 1.2x. 3) Finally, we fuse the Hadamard product with neighboring stages of the NTT, reducing the twiddle factor memory footprint by 50%. By combining our NTT optimizations, we achieve an overall speedup of 123.13x and 2.37x over the previous state-of-the-art CPU and GPU implementations of NTT kernels, respectively. |
- JAXED: Reverse Engineering DNN Architectures Leveraging JIT GEMM Libraries
- GNNMark: A benchmark suite to characterize graph neural network training on GPUs
- SMASH: Sparse Matrix Atomic Scratchpad Hashing
- Student cluster competition 2018, team northeastern university: Reproducing performance of a multi-physics simulations of the Tsunamigenic 2004 Sumatra Megathrust earthquake on the AMD EPYC 7551 architecture
- Speeding up DNNs using HPL based Fine-grained Tiling for Distributed Multi-GPU Training
- Video steganography using encrypted payload for satellite communication
- Missing'Middle Scenarios' Uncovering Nuanced Conditions in Latin America's Housing Crisis
- Dynamic power allocation using Stackelberg game in a wireless sensor network
- Automatic image annotation using a hybrid engine
What is KTB Wiki?
This website was built on KTB Wiki. KTB wiki is my side project/attempt to consolidate knowledge gained during my Ph.D. journey. Though many other platforms provide similar service, the process of creating KTB Wiki was a learning experience since it taught me concepts indexing, load balancing, and in-memory file systems. KTB Wiki was built using MediaWiki and is intended for research purposes only.
KTB Wiki, because the best way to store your knowledge is in an indexed SQL database.
Hobbies
Interesting Reads
Coming soon...