Difference between revisions of "Kaustubh Shivdikar"

m (12 Spaces spacing between sections)
m (FHE Slides added)
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[File:Kaustubh Shivdikar Cropped and Mirrored Image.jpg|thumb|249x249px|Boston, MA]]I am a Ph.D. candidate studying in [https://ece.northeastern.edu/groups/nucar/index.html NUCAR] lab at [https://www.northeastern.edu/ Northeastern University] under the guidance of [https://ece.northeastern.edu/fac-ece/kaeli.html David Kaeli]. My research focuses on designing '''hardware accelerators''' for '''sparse graph workloads'''.
+
[[File:Kaustubh Shivdikar Cropped and Mirrored Image.jpg|thumb|249x249px|Boston, MA]]Hi, I am Kaustubh, a Ph.D. candidate studying computer engineering in [https://ece.northeastern.edu/groups/nucar/index.html NUCAR] lab at [https://www.northeastern.edu/ Northeastern University] with my advisor [https://ece.northeastern.edu/fac-ece/kaeli.html David Kaeli]. My research focuses on designing '''hardware accelerators''' for '''sparse graph workloads'''.
  
 
My expertise lies in:
 
My expertise lies in:
Line 13: Line 13:
 
[https://www.researchgate.net/profile/Kaustubh-Shivdikar ResearchGate]  [https://scholar.google.com/citations?user=NCTXsGMAAAAJ&hl=en&oi=ao Google Scholar]
 
[https://www.researchgate.net/profile/Kaustubh-Shivdikar ResearchGate]  [https://scholar.google.com/citations?user=NCTXsGMAAAAJ&hl=en&oi=ao Google Scholar]
  
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
  
  
 
 
<br />
 
 
====Education====
 
====Education====
  
Line 30: Line 45:
 
*''Summer-Fall 2018 Coop:'' Mobile Robotics @ [https://www.adept.com/home/?region=us Omron Adept] with [https://www.linkedin.com/in/georgevpaul/ George Paul].
 
*''Summer-Fall 2018 Coop:'' Mobile Robotics @ [https://www.adept.com/home/?region=us Omron Adept] with [https://www.linkedin.com/in/georgevpaul/ George Paul].
  
 
+
<br />
 
+
<br />
 
+
<br />
 
+
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
  
  
Line 39: Line 68:
  
 
*June 2022: Mentored Lina Adkins for the GNN Acceleration project at [https://stem.northeastern.edu/summer/reu/pathways/students/ REU-Pathways] program
 
*June 2022: Mentored Lina Adkins for the GNN Acceleration project at [https://stem.northeastern.edu/summer/reu/pathways/students/ REU-Pathways] program
*May 2022: Served as '''Submission chair''' for [https://hpca-conf.org/2022/ HPCA 2023] conference.
+
*May 2022: Served as '''Submission chair''' for [https://hpca-conf.org/2023/ HPCA 2023] conference.
 
*Jan 2020: '''Taught''' the GPU Programming Course at NEU
 
*Jan 2020: '''Taught''' the GPU Programming Course at NEU
 
*April 2019: '''[https://coe.northeastern.edu/news/congratulations-rise2019-winners/ Graduate Innovator Award]''' at the RISE 2019 Research Expo for our poster Pi-Tiles
 
*April 2019: '''[https://coe.northeastern.edu/news/congratulations-rise2019-winners/ Graduate Innovator Award]''' at the RISE 2019 Research Expo for our poster Pi-Tiles
Line 46: Line 75:
 
*Nov 2017: Joined the NEU Team for '''Student Cluster Contest''' at Super Computing Conference 2017
 
*Nov 2017: Joined the NEU Team for '''Student Cluster Contest''' at Super Computing Conference 2017
  
 
 
 
 
 
<br />
 
<br />
 
+
<br />
 
+
<br />
 
+
<br />
 
+
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
  
  
Line 61: Line 99:
  
 
======'''Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs'''======
 
======'''Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs'''======
([https://seed22.engr.uconn.edu/ SEED 2022]) [PDF]
+
([https://seed22.engr.uconn.edu/ SEED 2022]) [[https://wiki.kaustubh.us/w/img_auth.php/FHE_SEED_2022.pdf PDF]][[https://wiki.kaustubh.us/w/img_auth.php/FHE_SEED_2022_Slides.pdf Slides]]
 +
[[https://www.researchgate.net/publication/363332393_Accelerating_Polynomial_Multiplication_for_Homomorphic_Encryption_on_GPUs RG]]
  
 
{| class="wikitable mw-collapsible mw-collapsed"
 
{| class="wikitable mw-collapsible mw-collapsed"
Line 67: Line 106:
 
!Abstract
 
!Abstract
 
|-
 
|-
|Fully Homomorphic Encryption (FHE) enables users to securely outsource both the storage and computation of sensitive data to untrusted servers. Not only does FHE offer an attractive solution for security in cloud systems, but lattice-based FHE systems are also believed to be resistant to attacks by quantum computers. However, current FHE implementations suffer from prohibitively high latency. For lattice-based FHE to become viable for real-world systems, it is necessary for the key bottlenecks---particularly polynomial multiplication---to be highly efficient.
+
|Homomorphic Encryption (HE) enables users to securely outsource both the storage and computation of sensitive data to untrusted servers. Not only does FHE offer an attractive solution for security in cloud systems, but lattice-based FHE systems are also believed to be resistant to attacks by quantum computers. However, current FHE implementations suffer from prohibitively high latency. For lattice-based FHE to become viable for real-world systems, it is necessary for the key bottlenecks---particularly polynomial multiplication---to be highly efficient.
  
 
In this paper, we present a characterization of GPU-based implementations of polynomial multiplication. We begin with a survey of modular reduction techniques and analyze several variants of the widely-used Barrett modular reduction algorithm. We then propose a modular reduction variant optimized for 64-bit integer words on the GPU, obtaining a 1.8x speedup over the existing comparable implementations.
 
In this paper, we present a characterization of GPU-based implementations of polynomial multiplication. We begin with a survey of modular reduction techniques and analyze several variants of the widely-used Barrett modular reduction algorithm. We then propose a modular reduction variant optimized for 64-bit integer words on the GPU, obtaining a 1.8x speedup over the existing comparable implementations.
Line 87: Line 126:
  
 
======'''JAXED: Reverse Engineering DNN Architectures Leveraging JIT GEMM Libraries'''======
 
======'''JAXED: Reverse Engineering DNN Architectures Leveraging JIT GEMM Libraries'''======
([https://www.seed-symposium.org/2021/index.html SEED 2021]) [[https://wiki.kaustubh.us/w/img_auth.php/JAXED_Reverse_Engineering_DNN_Architectures_Leveraging_JIT_GEMM_Libraries.pdf PDF]]
+
([https://www.seed-symposium.org/2021/index.html SEED 2021]) [[https://wiki.kaustubh.us/w/img_auth.php/JAXED_Reverse_Engineering_DNN_Architectures_Leveraging_JIT_GEMM_Libraries.pdf PDF]] [[https://wiki.kaustubh.us/w/img_auth.php/JAXED_Slides.pdf Slides]] [[https://wiki.kaustubh.us/w/img_auth.php/jaxed_poster.pdf Poster]]
 +
[[https://www.researchgate.net/publication/355356425_JAXED_Reverse_Engineering_DNN_Architectures_Leveraging_JIT_GEMM_Libraries RG]]
  
 
{| class="wikitable mw-collapsible mw-collapsed"
 
{| class="wikitable mw-collapsible mw-collapsed"
Line 112: Line 152:
 
======'''GNNMark: A benchmark suite to characterize graph neural network training on GPUs'''======
 
======'''GNNMark: A benchmark suite to characterize graph neural network training on GPUs'''======
 
([https://ispass.org/ispass2021/ ISPASS 2021]) [[https://wiki.kaustubh.us/w/img_auth.php/GNNMark.pdf PDF]]
 
([https://ispass.org/ispass2021/ ISPASS 2021]) [[https://wiki.kaustubh.us/w/img_auth.php/GNNMark.pdf PDF]]
 +
[[https://www.researchgate.net/publication/350159043_GNNMark_A_Benchmark_Suite_to_Characterize_Graph_Neural_Network_Training_on_GPUs RG]]
  
 
{| class="wikitable mw-collapsible mw-collapsed"
 
{| class="wikitable mw-collapsible mw-collapsed"
Line 136: Line 177:
 
======'''SMASH: Sparse Matrix Atomic Scratchpad Hashing'''======
 
======'''SMASH: Sparse Matrix Atomic Scratchpad Hashing'''======
 
([https://www.proquest.com/docview/2529815748?pq-origsite=gscholar&fromopenview=true MS Thesis, 2021]) [[https://wiki.kaustubh.us/w/img_auth.php/SMASH_Thesis.pdf PDF]]
 
([https://www.proquest.com/docview/2529815748?pq-origsite=gscholar&fromopenview=true MS Thesis, 2021]) [[https://wiki.kaustubh.us/w/img_auth.php/SMASH_Thesis.pdf PDF]]
 +
[[https://www.researchgate.net/publication/352018010_SMASH_Sparse_Matrix_Atomic_Scratchpad_Hashing RG]]
  
 
{| class="wikitable mw-collapsible mw-collapsed"
 
{| class="wikitable mw-collapsible mw-collapsed"
Line 161: Line 203:
 
======'''Student cluster competition 2018, team northeastern university: Reproducing performance of a multi-physics simulations of the Tsunamigenic 2004 Sumatra Megathrust earthquake on the AMD EPYC 7551 architecture'''======
 
======'''Student cluster competition 2018, team northeastern university: Reproducing performance of a multi-physics simulations of the Tsunamigenic 2004 Sumatra Megathrust earthquake on the AMD EPYC 7551 architecture'''======
 
([https://sc18.supercomputing.org/ SC 2018]) [[https://wiki.kaustubh.us/w/img_auth.php/SC_18_Cluster_Contest_Paper.pdf PDF]]
 
([https://sc18.supercomputing.org/ SC 2018]) [[https://wiki.kaustubh.us/w/img_auth.php/SC_18_Cluster_Contest_Paper.pdf PDF]]
 +
[[https://www.researchgate.net/publication/336659232_Student_Cluster_Competition_2018_Team_Northeastern_University_Reproducing_Performance_of_a_Multi-Physics_Simulations_of_the_Tsunamigenic_2004_Sumatra_Megathrust_Earthquake_on_the_AMD_EPYC_7551_Archite RG]]
 
{| class="wikitable mw-collapsible mw-collapsed"
 
{| class="wikitable mw-collapsible mw-collapsed"
 
|+Abstract
 
|+Abstract
Line 180: Line 223:
  
 
======'''Speeding up DNNs using HPL based Fine-grained Tiling for Distributed Multi-GPU Training'''======
 
======'''Speeding up DNNs using HPL based Fine-grained Tiling for Distributed Multi-GPU Training'''======
([https://bostonarch.github.io/2018/ BARC 2018]) [PDF]
+
([https://bostonarch.github.io/2018/ BARC 2018]) [[https://wiki.kaustubh.us/w/img_auth.php/BARC_speeding.pdf PDF]]
 
+
[[https://www.researchgate.net/publication/357766887_Speeding_up_DNNs_using_HPL_based_Fine-grained_Tiling_for_Distributed_Multi-GPU_Training RG]]
  
  
Line 188: Line 231:
  
 
======'''Video steganography using encrypted payload for satellite communication'''======
 
======'''Video steganography using encrypted payload for satellite communication'''======
([https://2017.aeroconf.org/ Aerospace Conference 2017]) [PDF]
+
([https://2017.aeroconf.org/ Aerospace Conference 2017]) [[https://wiki.kaustubh.us/w/img_auth.php/Video_Steganography.pdf PDF]]
 
+
[[https://www.researchgate.net/publication/317702110_Video_steganography_using_encrypted_payload_for_satellite_communication RG]]
  
  
Line 195: Line 238:
  
  
======'''Missing'Middle Scenarios' Uncovering Nuanced Conditions in Latin America's Housing Crisis'''======
+
======'''Missing 'Middle Scenarios' Uncovering Nuanced Conditions in Latin America's Housing Crisis'''======
([https://www.huduser.gov/portal/periodicals/cityscpe/vol19num2/article3.html Cityscape 2017]) [PDF]
+
([https://www.huduser.gov/portal/periodicals/cityscpe/vol19num2/article3.html Cityscape 2017]) [[https://wiki.kaustubh.us/w/img_auth.php/missing_middle.pdf PDF]]
 
+
[[https://www.researchgate.net/publication/361864952_Missing_Middle_Scenarios_Uncovering_Nuanced_Conditions_in_Latin_America's_Housing_Crisis RG]]
  
  
Line 205: Line 248:
  
 
======'''Dynamic power allocation using Stackelberg game in a wireless sensor network'''======
 
======'''Dynamic power allocation using Stackelberg game in a wireless sensor network'''======
([https://2016.aeroconf.org/ Aerospace Conference 2016]) [PDF]
+
([https://2016.aeroconf.org/ Aerospace Conference 2016]) [[https://wiki.kaustubh.us/w/img_auth.php/dynamic_power.pdf PDF]]
 
+
[[https://www.researchgate.net/publication/294873909_Dynamic_Power_Allocation_using_Stackelberg_Game_in_a_Wireless_Sensor_Network RG]]
  
  
Line 213: Line 256:
  
 
======'''Automatic image annotation using a hybrid engine'''======
 
======'''Automatic image annotation using a hybrid engine'''======
([https://ieeexplore.ieee.org/xpl/conhome/7438527/proceeding Indicon 2015]) [PDF]
+
([https://ieeexplore.ieee.org/xpl/conhome/7438527/proceeding Indicon 2015]) [[https://wiki.kaustubh.us/w/img_auth.php/automatic_image.pdf PDF]]
 
+
[[https://www.researchgate.net/publication/294874083_Automatic_Image_Annotation_using_a_Hybrid_Engine RG]]
 
 
 
 
 
 
  
 
<br />
 
<br />
 
+
<br />
 
+
<br />
 
+
<br />
 
+
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
  
  
 
====Posters====
 
====Posters====
  
*JAXED
+
*JAXED [[https://wiki.kaustubh.us/w/img_auth.php/jaxed_poster.pdf PDF]]
*Pi-Tiles
+
*Pi-Tiles (Graduate Innovator Award) [[https://wiki.kaustubh.us/w/img_auth.php/Pi_Tiles.pdf PDF]]
*The Prime Hexagon
+
*The Prime Hexagon (Best Poster Award) [[https://wiki.kaustubh.us/w/img_auth.php/Prime_Hexagon.pdf PDF]]
 
 
 
 
 
 
 
 
 
 
 
 
  
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 
<br />
 
<br />
  
  
 
 
 
<br />
 
 
==What is KTB Wiki?==
 
==What is KTB Wiki?==
 
<blockquote>''<big>KTB Wiki, because the best way to store your knowledge is in an indexed SQL database.</big>''</blockquote>This website was built on KTB Wiki. KTB wiki is my side project/attempt to consolidate knowledge gained during my Ph.D. journey. Though many other platforms provide similar service, the process of creating KTB Wiki was a learning experience since it taught me concepts of indexing, load balancing, and in-memory file systems. KTB Wiki was built using MediaWiki and is intended for research purposes only.<br />
 
<blockquote>''<big>KTB Wiki, because the best way to store your knowledge is in an indexed SQL database.</big>''</blockquote>This website was built on KTB Wiki. KTB wiki is my side project/attempt to consolidate knowledge gained during my Ph.D. journey. Though many other platforms provide similar service, the process of creating KTB Wiki was a learning experience since it taught me concepts of indexing, load balancing, and in-memory file systems. KTB Wiki was built using MediaWiki and is intended for research purposes only.<br />
  
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
  
 
 
 
 
 
 
 
 
 
<br />
 
  
 
==Interesting Reads==
 
==Interesting Reads==

Revision as of 12:06, 27 September 2022

Boston, MA

Hi, I am Kaustubh, a Ph.D. candidate studying computer engineering in NUCAR lab at Northeastern University with my advisor David Kaeli. My research focuses on designing hardware accelerators for sparse graph workloads.

My expertise lies in:

  • Computer Architecture Simulator Design
  • Graph Neural Network Accelerators
  • Sparse Matrix Accelerators
  • Homomorphic Encryption Accelerators
  • GPU Kernel Design

Contact: shivdikar.k [at] northeastern [dot] edu, mail [at] kaustubh [dot] us

ResearchGate Google Scholar




















Education

  • PhD - Computer Engineering, Northeastern University [Expected Fall 2023]
  • MS - Electrical and Computer Engineering, Northeastern University [May 2021]
  • BS - Electrical Engineering, Veermata Jijabai Technological Institute [May 2016]

Work




















Recent News

  • June 2022: Mentored Lina Adkins for the GNN Acceleration project at REU-Pathways program
  • May 2022: Served as Submission chair for HPCA 2023 conference.
  • Jan 2020: Taught the GPU Programming Course at NEU
  • April 2019: Graduate Innovator Award at the RISE 2019 Research Expo for our poster Pi-Tiles
  • April 2018: Best Poster Award at the RISE 2018 Research Expo for our poster The Prime Hexagon
  • Nov 2018: Mentored the NEU team for Student Cluster Contest at Super Computing Conference 2018
  • Nov 2017: Joined the NEU Team for Student Cluster Contest at Super Computing Conference 2017




















Publications

Lady Bug.png
Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs

(SEED 2022) [PDF][Slides] [RG]

Abstract
Abstract
Homomorphic Encryption (HE) enables users to securely outsource both the storage and computation of sensitive data to untrusted servers. Not only does FHE offer an attractive solution for security in cloud systems, but lattice-based FHE systems are also believed to be resistant to attacks by quantum computers. However, current FHE implementations suffer from prohibitively high latency. For lattice-based FHE to become viable for real-world systems, it is necessary for the key bottlenecks---particularly polynomial multiplication---to be highly efficient.

In this paper, we present a characterization of GPU-based implementations of polynomial multiplication. We begin with a survey of modular reduction techniques and analyze several variants of the widely-used Barrett modular reduction algorithm. We then propose a modular reduction variant optimized for 64-bit integer words on the GPU, obtaining a 1.8x speedup over the existing comparable implementations.


Next, we explore the following GPU-specific improvements for polynomial multiplication targeted at optimizing latency and throughput: 1) We present a 2D mixed-radix, multi-block implementation of NTT that results in a 1.85x average speedup over the previous state-of-the-art. 2) We explore shared memory optimizations aimed at reducing redundant memory accesses, further improving speedups by 1.2x. 3) Finally, we fuse the Hadamard product with neighboring stages of the NTT, reducing the twiddle factor memory footprint by 50%. By combining our NTT optimizations, we achieve an overall speedup of 123.13x and 2.37x over the previous state-of-the-art CPU and GPU implementations of NTT kernels, respectively.

FHE Teaser.png

FHE protects against network insecurities in untrusted cloud services, enabling users to securely offload sensitive data

Authors: Kaustubh Shivdikar, Gilbert Jonatan, Evelio Mora, Neal Livesay, Rashmi Agrawal, Ajay Joshi, José L. Abellán, John Kim, David Kaeli




Hacker icon.png
JAXED: Reverse Engineering DNN Architectures Leveraging JIT GEMM Libraries

(SEED 2021) [PDF] [Slides] [Poster] [RG]

Abstract
Abstract
General matrix multiplication (GEMM) libraries on x86 architectures have recently adopted Just-in-time (JIT) based optimizations to dramatically reduce the execution time of small and medium-sized matrix multiplication. The exploitation of the latest CPU architectural extensions, such as the AVX2 and AVX-512 extensions, are the target for these optimizations. Although JIT compilers can provide impressive speedups to GEMM libraries, they expose a new attack surface through the built-in JIT code caches. These software-based caches allow an adversary to extract sensitive information through carefully designed timing attacks. The attack surface of such libraries has become more prominent due to their widespread integration into popular Machine Learning (ML) frameworks such as PyTorch and Tensorflow.


In our paper, we present a novel attack strategy for JIT-compiled GEMM libraries called JAXED. We demonstrate how an adversary can exploit the GEMM library's vulnerable state management to extract confidential CNN model hyperparameters. We show that using JAXED, one can successfully extract the hyperparameters of models with fully-connected layers with an average accuracy of 92%. Further, we demonstrate our attack against the final fully connected layer of 10 popular DNN models. Finally, we perform an end-to-end attack on MobileNetV2, on both the convolution and FC layers, successfully extracting model hyperparameters.

JAXED Teaser.png

Attack Surface: After the victim’s execution, the victim leaves behind information about its model hyperparameters in the JIT code cache. The attacker probes this JIT code cache through the attacker’s ML model and observes timing information to determine the victim’s model hyperparameters.

Authors: Malith Jayaweera, Kaustubh Shivdikar, Yanzhi Wang, David Kaeli




Mini GNN.png
GNNMark: A benchmark suite to characterize graph neural network training on GPUs

(ISPASS 2021) [PDF] [RG]

Abstract
Abstract
Graph Neural Networks (GNNs) have emerged as a promising class of Machine Learning algorithms to train on non-euclidean data. GNNs are widely used in recommender systems, drug discovery, text understanding, and traffic forecasting. Due to the energy efficiency and high-performance capabilities of GPUs, GPUs are a natural choice for accelerating the training of GNNs. Thus, we want to better understand the architectural and system level implications of training GNNs on GPUs. Presently, there is no benchmark suite available designed to study GNN training workloads.


In this work, we address this need by presenting GNNMark, a feature-rich benchmark suite that covers the diversity present in GNN training workloads, datasets, and GNN frameworks. Our benchmark suite consists of GNN workloads that utilize a variety of different graph-based data structures, including homogeneous graphs, dynamic graphs, and heterogeneous graphs commonly used in a number of application domains that we mentioned above. We use this benchmark suite to explore and characterize GNN training behavior on GPUs. We study a variety of aspects of GNN execution, including both compute and memory behavior, highlighting major bottlenecks observed during GNN training. At the system level, we study various aspects, including the scalability of training GNNs across a multi-GPU system, as well as the sparsity of data, encountered during training. The insights derived from our work can be leveraged by both hardware and software developers to improve both the hardware and software performance of GNN training on GPUs.

GNN Analysis.png

Graph Neural Network Analysis

Authors: Trinayan Baruah, Kaustubh Shivdikar, Shi Dong, Yifan Sun, Saiful A Mojumder, Kihoon Jung, José L. Abellán, Yash Ukidave, Ajay Joshi, John Kim, David Kaeli




Core Image SMASH.png
SMASH: Sparse Matrix Atomic Scratchpad Hashing

(MS Thesis, 2021) [PDF] [RG]

Abstract
Abstract
Sparse matrices, more specifically Sparse Matrix-Matrix Multiply (SpGEMM) kernels, are commonly found in a wide range of applications, spanning graph-based path-finding to machine learning algorithms (e.g., neural networks). A particular challenge in implementing SpGEMM kernels has been the pressure placed on DRAM memory. One approach to tackle this problem is to use an inner product method for the SpGEMM kernel implementation. While the inner product produces fewer intermediate results, it can end up saturating the memory bandwidth, given the high number of redundant fetches of the input matrix elements. Using an outer product-based SpGEMM kernel can reduce redundant fetches, but at the cost of increased overhead due to extra computation and memory accesses for producing/managing partial products.


In this thesis, we introduce a novel SpGEMM kernel implementation based on the row-wise product approach. We leverage atomic instructions to merge intermediate partial products as they are generated. The use of atomic instructions eliminates the need to create partial product matrices, thus eliminating redundant DRAM fetches.

To evaluate our row-wise product approach, we map an optimized SpGEMM kernel to a custom accelerator designed to accelerate graph-based applications. The targeted accelerator is an experimental system named PIUMA, being developed by Intel. PIUMA provides several attractive features, including fast context switching, user-configurable caches, globally addressable memory, non-coherent caches, and asynchronous pipelines. We tailor our SpGEMM kernel to exploit many of the features of the PIUMA fabric.

This thesis compares our SpGEMM implementation against prior solutions, all mapped to the PIUMA framework. We briefly describe some of the PIUMA architecture features and then delve into the details of our optimized SpGEMM kernel. Our SpGEMM kernel can achieve 9.4x speedup as compared to competing approaches.

SMASH Algorithm.png

The SMASH Algorithm




Student cluster competition 2018, team northeastern university: Reproducing performance of a multi-physics simulations of the Tsunamigenic 2004 Sumatra Megathrust earthquake on the AMD EPYC 7551 architecture

(SC 2018) [PDF] [RG]

Abstract
Abstract
This paper evaluates the reproducibility of a Supercomputing 17 paper titled Extreme Scale Multi-Physics Simulations of the Tsunamigenic 2004 Sumatra Megathrust Earthquake. We evaluate reproducibility on a significantly smaller computer system than used in the original work. We found that we able to demon- strate reproducibility of the multi-physics simulations on a single-node system, as well as confirm multi- node scaling. However, reproducibility of the visual and geophysical simulation results were inconclusive due to issues related to input parameters provided to our model. The SC 17 paper provided results for both CPU-based simulations as well as Xeon Phi based simulations. Since our cluster uses NVIDIA V100s for acceleration, we are only able to assess the CPU-based results in terms of reproducibility.
Earthquake simulation.png

Horizontal Seafloor displacement simulation

Authors: Chris Bunn, Harrison Barclay, Anthony Lazarev, Toyin Yusuf, Jason Fitch, Jason Booth, Kaustubh Shivdikar, David Kaeli




Speeding up DNNs using HPL based Fine-grained Tiling for Distributed Multi-GPU Training

(BARC 2018) [PDF] [RG]




Video steganography using encrypted payload for satellite communication

(Aerospace Conference 2017) [PDF] [RG]




Missing 'Middle Scenarios' Uncovering Nuanced Conditions in Latin America's Housing Crisis

(Cityscape 2017) [PDF] [RG]




Dynamic power allocation using Stackelberg game in a wireless sensor network

(Aerospace Conference 2016) [PDF] [RG]




Automatic image annotation using a hybrid engine

(Indicon 2015) [PDF] [RG]




















Posters

  • JAXED [PDF]
  • Pi-Tiles (Graduate Innovator Award) [PDF]
  • The Prime Hexagon (Best Poster Award) [PDF]




















What is KTB Wiki?

KTB Wiki, because the best way to store your knowledge is in an indexed SQL database.

This website was built on KTB Wiki. KTB wiki is my side project/attempt to consolidate knowledge gained during my Ph.D. journey. Though many other platforms provide similar service, the process of creating KTB Wiki was a learning experience since it taught me concepts of indexing, load balancing, and in-memory file systems. KTB Wiki was built using MediaWiki and is intended for research purposes only.




















Interesting Reads