Skip to content
View fafancier's full-sized avatar

Block or report fafancier

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

This repository contains data pre-processing and visualization scripts used in GENEA Challenge 2022 and 2023. Check the repository's README.md file for instructions on how to use scripts yourself.

Python 28 6 Updated May 29, 2025

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,141 223 Updated May 19, 2025

🎙️ 「大模型」从0训练0.1B能听能说能看的全模态Omni模型!A 0.1B Omni model trained from scratch, capable of listening, speaking, and seeing!

Python 734 75 Updated May 8, 2026

X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…

Python 205 24 Updated Apr 29, 2026

A curated list of full-duplex spoken dialogue models & benchmarks

66 2 Updated May 5, 2026

Towards Self-Evolving Proactive AI with Perpetual Memory

Python 194 21 Updated Apr 17, 2026

SALMONN family: A suite of advanced multi-modal LLMs

1,423 113 Updated Apr 20, 2026

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

Python 13,008 1,117 Updated May 9, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 79,514 16,602 Updated May 10, 2026

A production-grade, multi-modal voice gateway providing real-time audio-to-audio interaction, read-aloud TTS, transcription, and model introspection. Built on vLLM-Omni architecture with Qwen3 models.

Python 2 Updated Jan 31, 2026

Run Qwen3 Omni - A multimodal AI assistant demo

TypeScript 72 16 Updated Oct 16, 2025

N.E.K.O. — A proactive, native omni AI companion that suggests what to watch, read, know, and play — then joins in with an embodied emotional engine.

Python 1,061 147 Updated May 10, 2026

This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.

TypeScript 6,855 1,085 Updated Jan 7, 2026

The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.

Rust 190,864 109,806 Updated May 9, 2026

End-to-end realtime stack for connecting humans and AI

Go 18,571 1,968 Updated May 9, 2026

基于阿里云的tts, llm,stt模型构建的实时对话应用

TypeScript 22 11 Updated Jun 4, 2024

🟢🌍2026最新超详细+极速+隐私 Hysteria2一键安装脚本,默认解锁GPT和奈飞;🛡️附带VPN 安全性检测指南

Shell 41 4 Updated Apr 28, 2026

A framework for efficient model inference with omni-modality models

Python 4,668 898 Updated May 9, 2026

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,737 256 Updated Apr 23, 2026

OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams

Python 95 2 Updated Mar 15, 2026

(arXiv 2026) Pytorch implementation of “PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization”

Python 34 1 Updated Apr 20, 2026

Official implementation of Kimodo, a kinematic motion diffusion model for high-quality human(oid) motion generation.

Python 2,267 239 Updated May 3, 2026

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 10,168 951 Updated May 5, 2026

Open Source framework for voice and multimodal conversational AI

Python 11,984 2,018 Updated May 9, 2026

A framework for building realtime voice AI agents 🤖🎙️📹

Python 10,409 3,112 Updated May 9, 2026

A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone

Python 24,528 1,908 Updated May 7, 2026

The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.

Python 156 2 Updated Oct 28, 2025

MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks

Jupyter Notebook 8,860 571 Updated Feb 11, 2026

A simulation evaluation platform for DROID

Python 196 31 Updated Mar 16, 2026

[ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation"

Python 626 40 Updated Apr 30, 2026
Next