TVM Relax: ViT Encoder to Transformer Decoder (multimodal architecture) (2025)
Python • TVM • CUDA
Benchmarking a Vision Transformer encoder + transformer decoder in TVM Relax with baseline vs. optimized pipelines.
Python • TVM • CUDA
Benchmarking a Vision Transformer encoder + transformer decoder in TVM Relax with baseline vs. optimized pipelines.
C++ • MLIR • LLVM • CUDA
Tiny playground for a custom MLIR dialect with Add+ReLU fusion and lowering to PTX.
C++ • MLIR • LLVM
Learning by adding new operations to an existing dialect and lowering it down to LLVM.
Python • PyTorch • ONNX Runtime • Kria KV260
Detect Wi-Fi extender power loss by watching its status LED and sending instant alerts.
Python • PyTorch • Triton
Custom implementation of Flash Attention forward/backward kernels with benchmarking.
C++ • CUDA 12.x • RTX 3060
Benchmarking and comparing Matrix Multiplication operation on shared memory and naive approach.