Skip to content
View enp1s0's full-sized avatar
🤯
Computing
🤯
Computing

Organizations

@FDPS @rioyokotalab @mori-lab @rapidsai @wmmae @hpc-wakate

Block or report enp1s0

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

GEMMul8 (GEMMulate): GEMM emulation using INT8/FP8 matrix engines based on the Ozaki Scheme II

C++ 69 14 Updated Apr 6, 2026

collection of articles about PhD life written in 🇯🇵

338 8 Updated Apr 3, 2026

The book "Performance Analysis and Tuning on Modern CPU"

TeX 3,538 247 Updated Jun 9, 2025

LLM training in simple, raw C/CUDA

Cuda 29,838 3,579 Updated Jun 26, 2025

The official Vim repository

Vim Script 40,338 6,043 Updated May 7, 2026

A ksvd implementation written in python.

Python 115 23 Updated Dec 26, 2022

Itoyori: A distributed multi-threading runtime system for global-view fork-join task parallelism

C++ 23 2 Updated Feb 9, 2024

A lightweight TUI (ncurses-like) display manager for Linux and BSD (mirror of https://codeberg.org/fairyglade/ly).

Zig 7,262 346 Updated May 4, 2026

int8_t and int16_t matrix multiply based on https://arxiv.org/abs/1705.01991

C++ 74 25 Updated Dec 30, 2023

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.

C++ 5,019 456 Updated Apr 29, 2026

Synchronize your working directory efficiently to a remote place without committing the changes.

Go 75 11 Updated Nov 7, 2022

GPTPU for SC 2021

C++ 52 9 Updated Mar 22, 2023

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

C++ 622 92 Updated Sep 11, 2024

stdgpu: Efficient STL-like Data Structures on the GPU

C++ 1,261 98 Updated Apr 10, 2026

Linux Kernel for Surface Devices

Shell 7,223 320 Updated May 7, 2026

A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).

C++ 572 74 Updated Sep 15, 2025

Dear ImGui: Bloat-free Graphical User interface for C++ with minimal dependencies

C++ 73,058 11,755 Updated May 7, 2026

Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantia…

C++ 792 43 Updated Dec 7, 2024

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,632 647 Updated Feb 15, 2025

Test suite for probing the numerical behavior of NVIDIA tensor cores

Cuda 42 15 Updated Jul 24, 2024

Important concepts in numerical linear algebra and related areas

808 69 Updated Jan 13, 2024

A massively-parallel, block-sparse tensor framework written in C++

C++ 316 59 Updated Apr 29, 2026

Parallel Library for Tensor Network Methods

C++ 32 8 Updated Apr 23, 2026

⚡ Dark powered Vim/Neovim plugin manager

Vim Script 3,435 192 Updated Sep 13, 2025

gpuprec: Extended-Precision Libraries on GPUs

Cuda 41 7 Updated Jan 9, 2016

Crow is very fast and easy to use C++ micro web framework (inspired by Python Flask)

C++ 7,626 879 Updated Jun 6, 2024

A compact split ortholinear keyboard.

Python 977 182 Updated Nov 20, 2022

Binary Neural Network Framework for FPGA(Differentiable LUT)

C++ 172 23 Updated Aug 12, 2025

rust-cuda working group

64 6 Updated Jun 12, 2019
Next