CUSTOMER SERVICE AUDIO DATASET
6,000,000 hours of real human conversation
The single largest multilingual customer service audio dataset in the world.
Sourced from global financial services. 29 languages.
Model
READY
Built to plug directly into your training pipeline.
• Accurate transcripts
• Fully diarized
• Dual channel
• PII redacted
Multilingual
29 languages across every major market.
• English, Spanish, French
• Arabic, Korean, Japanese
• Hindi, Urdu, Mandarin
• 20+ more
Native
METADATA
The context around the call, not just the call itself.
• CSAT scores
• Geographic location
• Outcomes and summaries
• Topic & sentiment labels
Domain
DENSITY
USE CASES
What you can build with real conversation data
Frontier models are trained on clean data. Production environments aren't clean.
Robust ASR in Noisy Conditions
Train models that actually work in production
Real calls include background noise, overlapping speech, and heavy accents - the exact conditions where clean-data-trained ASR models degrade. Fine-tune on authentic acoustic environments to close the gap between benchmark WER and production WER.
Speech Recognition
Accent Adaptation
Noise Robustness
Emotion & Tone Recognition
Go beyond surface-level sentiment
Customer service audio is one of the richest natural sources of emotionally dynamic conversation. Train models to detect frustration masked by calm speech, sarcasm, escalation patterns, and tonal shifts that carry more signal than words alone.
Sentiment Analysis
Paralinguistics
Affective Computing
Pragmatic & Indirect Speech
Interpret what people mean, not what they say
"I guess I'll just figure it out myself" isn't a plan - it's a complaint. Customer calls are dense with indirect speech acts and implicit requests that frontier models still take at face value.
NLU
Intent Detection
Pragmatics
Turn-Taking & Conversation Flow
Build voice agents that don't talk over people
Real dialogue involves interruptions, backchannels, long pauses, and implicit cues about when a speaker is done vs. thinking. Train on natural patterns to build systems that handle conversational flow without awkward collisions.
Voice Agents
Dialogue Systems
Real-Time Processing
Code-Switching & Multilingual Mixing
Handle language the way people actually speak it
Diverse customer bases produce natural intra-sentential code-switching - Spanglish, Hinglish, Cantonese-English blends. Models handle each language in isolation but break at the seams. Train on real multilingual speech.
Multilingual
Code-Switching
Language ID
Noisy Transcript Comprehension
Extract meaning from imperfect ASR output
Production pipelines produce run-on, unpunctuated, error-filled transcripts. Fine-tune downstream models • summarization, entity extraction, routing - to be robust to the disfluent, messy input they'll actually encounter.
Post-ASR NLP
Entity Extraction
Summarization
Technical Specs
Format
Dual-channel audio (agent + caller separated)
Sample Rate
8kHz / 16-bit PCM
Transcripts
Fully diarized with speaker-aligned timestamps
Native Annotations
(Included)
Word-Level Timestamps
PII Redaction
Geography
CSAT
Intent
Sentiment
Overtalk
Human Annotations
(Available)
Emotions
Accent & Dialect
Code Switching
Disfluency
Social Dynamics
Cultural Patterns
Privacy & Compliance
All personally identifiable information has been removed from both audio and transcripts. The data is fully compliant with CCA, GDPR, and applicable privacy regulations - ready for use in model training, evaluation, and research without additional redaction or legal review.
Provenance
Every hour of audio originates from the global financial services industry, serving millions of customers across 200+ countries. Calls span account inquiries, transaction disputes, fraud resolution, compliance verification, and multilingual support interactions.
Access Our
150 Hour Sample

