fafancier

fafancier

5 followers · 9 following

Lists (29)

Sort

Starred repositories

TeoNikolov / genea_visualizer

Forked from jonepatr/genea_visualizer

This repository contains data pre-processing and visualization scripts used in GENEA Challenge 2022 and 2023. Check the repository's README.md file for instructions on how to use scripts yourself.

Python 28 6 Updated May 29, 2025

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,141 223 Updated May 19, 2025

jingyaogong / minimind-o

🎙️ 「大模型」从0训练0.1B能听能说能看的全模态Omni模型！A 0.1B Omni model trained from scratch, capable of listening, speaking, and seeing!

Python 734 75 Updated May 8, 2026

xcc-zach / xtalk

X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…

Python 205 24 Updated Apr 29, 2026

Ruiqi-Yan / Awesome-Full-Duplex-SDM

A curated list of full-duplex spoken dialogue models & benchmarks

66 2 Updated May 5, 2026

xzf-thu / Pask

Towards Self-Evolving Proactive AI with Perpetual Memory

Python 194 21 Updated Apr 17, 2026

bytedance / SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

1,423 113 Updated Apr 20, 2026

jundot / omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

Python 13,008 1,117 Updated May 9, 2026

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 79,514 16,602 Updated May 10, 2026

A production-grade, multi-modal voice gateway providing real-time audio-to-audio interaction, read-aloud TTS, transcription, and model introspection. Built on vLLM-Omni architecture with Qwen3 models.

Python 2 Updated Jan 31, 2026

xiaotianfotos / run-qwen3-omni

Run Qwen3 Omni - A multimodal AI assistant demo

TypeScript 72 16 Updated Oct 16, 2025

Project-N-E-K-O / N.E.K.O

N.E.K.O. — A proactive, native omni AI companion that suggests what to watch, read, know, and play — then joins in with an embodied emotional engine.

Python 1,061 147 Updated May 10, 2026

openai / openai-realtime-agents

This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.

TypeScript 6,855 1,085 Updated Jan 7, 2026

ultraworkers / claw-code

The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.

Rust 190,864 109,806 Updated May 9, 2026

livekit / livekit

End-to-end realtime stack for connecting humans and AI

Go 18,571 1,968 Updated May 9, 2026

hanxie-crypto / conversational-ai-livekit

基于阿里云的tts, llm,stt模型构建的实时对话应用

TypeScript 22 11 Updated Jun 4, 2024

georgetime1970 / Hysteria2

🟢🌍2026最新超详细+极速+隐私 Hysteria2一键安装脚本,默认解锁GPT和奈飞;🛡️附带VPN 安全性检测指南

Shell 41 4 Updated Apr 28, 2026

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 4,668 898 Updated May 9, 2026

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,737 256 Updated Apr 23, 2026

Go2Heart / OmniStream

OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams

Python 95 2 Updated Mar 15, 2026

Mael-zys / PhysMoDPO

(arXiv 2026) Pytorch implementation of “PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization”

Python 34 1 Updated Apr 20, 2026

nv-tlabs / kimodo

Official implementation of Kimodo, a kinematic motion diffusion model for high-quality human(oid) motion generation.

Python 2,267 239 Updated May 3, 2026

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 10,168 951 Updated May 5, 2026

pipecat-ai / pipecat

Open Source framework for voice and multimodal conversational AI

Python 11,984 2,018 Updated May 9, 2026

livekit / agents

A framework for building realtime voice AI agents 🤖🎙️📹

Python 10,409 3,112 Updated May 9, 2026

OpenBMB / MiniCPM-o

A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone

Python 24,528 1,908 Updated May 7, 2026

Tencent / VITA

The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.

Python 156 2 Updated Oct 28, 2025

OpenBMB / MiniCPM

MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks

Jupyter Notebook 8,860 571 Updated Feb 11, 2026

arhanjain / sim-evals

A simulation evaluation platform for DROID

Python 196 31 Updated Mar 16, 2026

thu-ml / Causal-Forcing

[ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation"

Python 626 40 Updated Apr 30, 2026

fafancier

Lists (29)

3DGS

Agent

AIGC

Animation

Calibration

Concept

DIBR

DigitalHuman

Fusion

GPT

ImageTask2D

Library

LLM

LocoManip

MeshProcess

MM-Interaction

Motion

NERF

ObjectGeneration

Reconstruction

Render

Robot

SceneGen

Survey

Tools

VideoGen

VideoInterpolation

VLA

WorldModel

Starred repositories

3d-generation

bundle-adjustment

stereo-matching