Hardware architect at AMD designing next-generation GPU microarchitecture. Ph.D. in Computer Engineering from Northeastern University, with research spanning GNN acceleration, homomorphic encryption on GPUs, and hardware-software co-design for emerging workloads.
Designing next-generation GPU compute architectures at AMD. Hardware-software co-design for CDNA/RDNA series targeting ML inference, training, and HPC workloads.
NeuraChip — a spatial accelerator using Gustavson’s algorithm with decoupled multiply-add and hash-based load balancing. 22x speedup over vendor sparse libraries.
GME — GPU-based microarchitectural extensions for FHE. Polynomial multiplication optimization and Barrett reduction on CDNA GPUs. Top Pick in HES 2024.
Characterizing Mamba-based SSM training on GPUs, building workload suites that span model architectures to guide hardware optimization decisions.
Reverse engineering DNN architectures by exploiting JIT GEMM code caches to extract CNN model hyperparameters — exposing side-channel vulnerabilities in ML inference.
Evaluating real PIM architectures using UPMEM systems. PIMnet — domain-specific interconnection for efficient collective communication in scalable PIM.