pyannote audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

1471
313
Python

Using pyannote.audio open-source toolkit in production?
Consider switching to pyannoteAI for better and faster options.

pyannote speaker diarization toolkit

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.

Highlights

Open-source speaker diarization pipeline

  1. Install pyannote.audio with pip install pyannote.audio
  2. Accept pyannote/segmentation-3.0 user conditions
  3. Accept pyannote/speaker-diarization-3.1 user conditions
  4. Create Huggingface access token at hf.co/settings/tokens.
import torch
from pyannote.audio import Pipeline
from pyannote.audio.pipelines.utils.hook import ProgressHook

# Open-source pyannote speaker diarization pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    token="HUGGINGFACE_ACCESS_TOKEN")

# send pipeline to GPU (when available)
pipeline.to(torch.device("cuda"))

# apply pretrained pipeline (with optional progress hook)
with ProgressHook() as hook:
    diarization = pipeline("audio.wav", hook=hook)  # runs locally

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...

Premium pyannoteAI speaker diarization pipeline

  1. Install pyannote.audio with pip install pyannote.audio
  2. Create pyannoteAI API key at dashboard.pyannote.ai
from pyannote.audio import Pipeline

# Premium pyannoteAI speaker diarization service
pipeline = Pipeline.from_pretrained(
    "pyannoteAI/speaker-diarization-precision", token="PYANNOTEAI_API_KEY")

diarization = pipeline("audio.wav")  # runs on pyannoteAI servers

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s {speaker}")
# start=0.2s stop=1.6s SPEAKER_00
# start=1.8s stop=4.0s SPEAKER_01 
# start=4.2s stop=5.6s SPEAKER_00
# ...

Visit docs.pyannote.ai to learn about other pyannoteAI features (voiceprinting, confidence scores, 
)

Benchmark

Out of the box, pyannote.audio speaker diarization pipeline v3.1 is expected to be much better (and faster) than v2.x. pyannoteAI premium model goes one step further. Those numbers are diarization error rates (in %) - the lower the better.

Benchmark (2025-03) v2.1 v3.1
AISHELL-4 14.1 12.2 12.1
AliMeeting (channel 1) 27.4 24.5 19.8
AMI (IHM) 18.9 18.8 15.8
AMI (SDM) 27.1 22.7 18.3
AVA-AVD 66.3 49.7 45.3
CALLHOME (part 2) 31.6 28.4 20.1
DIHARD 3 (full) 26.9 21.4 17.2
Earnings21 17.0 9.4 9.0
Ego4D (dev.) 61.5 51.2 45.8
MSDWild 32.8 25.4 19.7
RAMC 22.5 22.2 11.1
REPERE (phase2) 8.2 7.9 7.6
VoxConverse (v0.3) 11.2 11.2 9.9

Diarization error rate (in %)

Documentation

Citations

If you use pyannote.audio please use the following citations:

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Development

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Test

pytest