-
University of Catania
- Italy, Sicily, Catania
- https://seminaraluigi.altervista.org/
- @Gigii_Gii
- @luseminara.bsky.social
- in/luigi-seminara
Highlights
- Pro
Starred repositories
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
Code, data and weights for the paper **What drives success in physical planning with Joint-Embedding Predictive World Models?**
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation" (ICCV 2025)
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
[IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Using advances in generative modeling to learn reward functions from unlabeled videos.
Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024
An open-source AI agent that brings the power of Gemini directly into your terminal.
An extension of the PyTorch library containing various tools for performing deep learning in hyperbolic space.
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs [ECCV, 2024]
Official PyTorch Implementation of Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
Implementation of Autoregressive Diffusion in Pytorch
[BMVC2022, IJCV2023, Best Student Paper, Spotlight] Official codes for the paper "In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation".