Skip to content
Change the repository type filter

All

    Repositories list

    • tpu-inference

      Public
      TPU inference for vLLM, with unified JAX and PyTorch support.
      Python
      Apache License 2.0
      18632856199Updated May 11, 2026May 11, 2026
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      17k80k2k2.9kUpdated May 11, 2026May 11, 2026
    • TypeScript
      2010Updated May 11, 2026May 11, 2026
    • perf-eval

      Public
      Performance benchmark & accuracy evaluation for vLLM
      Python
      2001Updated May 11, 2026May 11, 2026
    • HTML
      913914Updated May 11, 2026May 11, 2026
    • A safetensors extension to efficiently store sparse quantized tensors on disk
      Python
      Apache License 2.0
      862761326Updated May 11, 2026May 11, 2026
    • Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      Apache License 2.0
      5043.2k6654Updated May 11, 2026May 11, 2026
    • System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
      Go
      Apache License 2.0
      6654.2k10478Updated May 11, 2026May 11, 2026
    • guidellm

      Public
      Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
      Python
      Apache License 2.0
      1501.1k6428Updated May 11, 2026May 11, 2026
    • A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
      Python
      Apache License 2.0
      804201731Updated May 11, 2026May 11, 2026
    • vLLM Quantization plugin for GGUF
      Cuda
      Apache License 2.0
      1222Updated May 11, 2026May 11, 2026
    • vllm-omni

      Public
      A framework for efficient model inference with omni-modality models
      Python
      Apache License 2.0
      9034.7k412373Updated May 11, 2026May 11, 2026
    • Community maintained hardware plugin for vLLM on Intel Gaudi
      Python
      Apache License 2.0
      12939439Updated May 11, 2026May 11, 2026
    • vllm-ascend

      Public
      Community maintained hardware plugin for vLLM on Ascend
      Python
      Apache License 2.0
      1.2k2.1k1.4k466Updated May 11, 2026May 11, 2026
    • Community maintained hardware plugin for vLLM on Apple Silicon
      Python
      Apache License 2.0
      1281.1k118Updated May 11, 2026May 11, 2026
    • recipes

      Public
      Common recipes to run vLLM
      JavaScript
      Apache License 2.0
      2657872658Updated May 11, 2026May 11, 2026
    • Stateful API logic for agentic applications using vLLM
      Makefile
      Apache License 2.0
      92512Updated May 11, 2026May 11, 2026
    • aibrix

      Public
      Cost-efficient and pluggable Infrastructure components for GenAI inference
      Go
      Apache License 2.0
      5794.8k27640Updated May 11, 2026May 11, 2026
    • vllm-xpu-kernels

      Public
      The vLLM XPU kernels for Intel GPU
      C++
      Apache License 2.0
      57421433Updated May 11, 2026May 11, 2026
    • vLLM Daily Summarization of Merged PRs
      45000Updated May 10, 2026May 10, 2026
    • vLLM plugin for block-based diffusion language model (dLLM) support
      Python
      Apache License 2.0
      51550Updated May 9, 2026May 9, 2026
    • ci-infra

      Public
      This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
      HCL
      Apache License 2.0
      6838042Updated May 8, 2026May 8, 2026
    • vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
      Python
      Apache License 2.0
      4022.3k9868Updated May 7, 2026May 7, 2026
    • vLLM Model plugin for the encoder-decoder BART model
      Python
      Apache License 2.0
      71126Updated May 7, 2026May 7, 2026
    • router

      Public
      A high-performance and light-weight router for vLLM large scale deployment
      Rust
      Apache License 2.0
      772231319Updated May 6, 2026May 6, 2026
    • Fast and memory-efficient exact attention
      Python
      BSD 3-Clause "New" or "Revised" License
      2.7k122026Updated May 5, 2026May 5, 2026
    • FlashMLA

      Public
      C++
      MIT License
      1k1303Updated Apr 20, 2026Apr 20, 2026
    • Agent skills for vLLM
      Shell
      Apache License 2.0
      206732Updated Apr 3, 2026Apr 3, 2026
    • Community maintained hardware plugin for vLLM on AWS Neuron
      Python
      Apache License 2.0
      112961Updated Mar 20, 2026Mar 20, 2026
    • Performance dashboard for vLLM
      Python
      2101Updated Mar 10, 2026Mar 10, 2026
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.