Lists (29)
Sort Name ascending (A-Z)
3DGS
Agent
AIGC
Animation
Calibration
Concept
DIBR
DigitalHuman
Fusion
GPT
ImageTask2D
Library
LLM
LocoManip
MeshProcess
MM-Interaction
Motion
NERF
ObjectGeneration
Reconstruction
Render
Robot
SceneGen
Survey
Tools
VideoGen
VideoInterpolation
VLA
WorldModel
Starred repositories
This repository contains data pre-processing and visualization scripts used in GENEA Challenge 2022 and 2023. Check the repository's README.md file for instructions on how to use scripts yourself.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
🎙️ 「大模型」从0训练0.1B能听能说能看的全模态Omni模型!A 0.1B Omni model trained from scratch, capable of listening, speaking, and seeing!
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…
A curated list of full-duplex spoken dialogue models & benchmarks
Towards Self-Evolving Proactive AI with Perpetual Memory
SALMONN family: A suite of advanced multi-modal LLMs
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
A high-throughput and memory-efficient inference and serving engine for LLMs
A production-grade, multi-modal voice gateway providing real-time audio-to-audio interaction, read-aloud TTS, transcription, and model introspection. Built on vLLM-Omni architecture with Qwen3 models.
Run Qwen3 Omni - A multimodal AI assistant demo
N.E.K.O. — A proactive, native omni AI companion that suggests what to watch, read, know, and play — then joins in with an embodied emotional engine.
This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
End-to-end realtime stack for connecting humans and AI
基于阿里云的tts, llm,stt模型构建的实时对话应用
🟢🌍2026最新超详细+极速+隐私 Hysteria2一键安装脚本,默认解锁GPT和奈飞;🛡️附带VPN 安全性检测指南
A framework for efficient model inference with omni-modality models
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
(arXiv 2026) Pytorch implementation of “PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization”
Official implementation of Kimodo, a kinematic motion diffusion model for high-quality human(oid) motion generation.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Open Source framework for voice and multimodal conversational AI
A framework for building realtime voice AI agents 🤖🎙️📹
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
[ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation"