YOU CAN REPLICATE THIS PROJECT -> https://github.com/Nagharjun17/MLIR-to-PTX-CUDA
- Built a custom MLIR dialect (mcomp) with a fused op
mcomp.fuse_add_relu. - Implemented passes: fuse arith.addf and relu to mcomp.fuse_add_relu, and lower back to arith.
- Extended lowering pipeline: MLIR → LLVM IR → PTX (via NVVM).
- Verified outputs by compiling test MLIR files to LLVM IR and GPU PTX assembly.
- Demonstrated PTX kernel emission for NVIDIA GPUs (RTX 3060).