Kernel Coding and Machine Learning Engineer with a strong focus on performance-critical systems, computer vision, and large-scale inference.
I work across the full stack when required(happens always :) ), from low-level GPU kernels and hardware-aware optimizations to distributed backend systems and production ML pipelines.
π Website: https://1y33.github.io
- GPU kernel development
- Low-level optimization of attention mechanisms, GEMM kernels, and inference pipelines
- FPGA-based GPGPU and experimental AI accelerator designs
- Deep learning models developed for multiple applied solutions
- Large Language Model training for fun
- Distributed worker systems for large-scale data processing and real-time inference
- Full-stack applications supporting ML workflows, deployment, and monitoring
|\__/,| (`\
_.|o o |_ ) )
---(((---(((--------
βIf itβs slow, profile it.
If itβs still slow, write a kernel.
If itβs still slowβ¦ blame the cat.β