Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)【安全加固,暂停交互...
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
A data augmentations library for audio, image, text, and video.
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
:musical_note: :rainbow: Real-time LED strip music visualization using Python and the ESP8266 or Raspberry Pi...
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)...
Cast macOS and Linux Audio/Video to your Google Cast and Sonos Devices
A GUI frontend for @werman's Pulse Audio real-time noise suppression plugin
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
Data manipulation and transformation for audio signal processing, powered by PyTorch
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding...
The PyTorch-based audio source separation toolkit for researchers
WaveGAN: Learn to synthesize raw audio with generative adversarial networks
SincNet is a neural architecture for efficiently processing raw audio samples.
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in ind...
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning....
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check...
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with M...
GUI for a Vocal Remover that uses Deep Neural Networks.
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper....
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation...
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频...
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild...
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineer...
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM...
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate....
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio....
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversation...
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Languag...