Speech Recognition (ASR)

Overview

Your AI agent is only as good as its ability to understand what callers are saying. Voquii's Speech Recognition engine captures every word with exceptional accuracy, even in challenging real-world conditions—background noise, accents, industry jargon, and rapid speech.

Powered by Deepgram's industry-leading AI, Voquii delivers the fastest, most accurate transcription available, enabling truly natural conversations that flow like human dialogue.

Real-Time Transcription

Voquii transcribes speech as it happens—not after. This real-time capability is the foundation of natural conversation:

Capability	Description	Business Impact
Streaming Recognition	Words transcribed as spoken	No awkward waiting
Interim Results	See partial words forming	Faster response preparation
Final Results	Polished, accurate transcript	Reliable data capture
Continuous Listening	No timeouts or cutoffs	Complete conversations

Accuracy Comparison

Metric	Industry Average	Voquii/Deepgram
Word Error Rate (WER)	15-25%	<8%
Recognition Speed	1-2x real-time	<300ms latency
Accuracy with Accents	70-80%	>90%
Noisy Environment	60-75%	>85%

Multiple Language Support

Voquii understands your customers no matter what language they speak:

Tier 1 Languages (Highest Accuracy)

English (US, UK, AU) — 96-97%+ accuracy
Spanish (Mexican, Castilian, Latin American) — 95%+
French (France, Canadian) — 95%+
German (Standard, Austrian, Swiss) — 95%+

Tier 2 Languages (Excellent Accuracy)

Portuguese, Italian, Dutch, Japanese, Mandarin Chinese, Korean, Hindi

Barge-In Detection

Nothing frustrates callers more than being forced to listen to an entire message before responding. Voquii's barge-in detection lets callers interrupt naturally—just like talking to a human.

Metric	Without Barge-In	With Barge-In
Average Call Duration	4:30	3:15 (-28%)
Caller Satisfaction	72%	89% (+17 pts)
Repeat Information	45% of calls	12% of calls
Caller Frustration Events	23%	6%

"Barge-in was the feature that made our AI agent feel human. Callers stopped complaining about 'being talked at' and started having real conversations."
— Retail Operations Director

Endpointing Configuration

Know when callers are done speaking with configurable silence thresholds:

Setting	Silence Duration	Use Case
Quick	500ms	Fast-paced calls, yes/no questions
Standard	800ms	General conversations
Patient	1200ms	Complex questions, elderly callers
Extended	2000ms	Thoughtful responses, calculations

Background Noise Handling

Voquii maintains accuracy even in challenging acoustic environments:

Office background chatter
Street noise and traffic
Construction sites
Speakerphone echo
Wind and outdoor environments

Advanced Features

📝

Custom Vocabulary

Add industry-specific terms, product names, and jargon for 40-60% better recognition

🔢

Number Recognition

Accurate capture of phone numbers, credit cards, dates, and addresses

👥

Speaker Diarization

Identify and separate multiple speakers with "who said what" attribution

✨

Punctuation & Formatting

Automatic intelligent punctuation and sentence boundaries

Compliance & Security

Standard	Status	Scope
SOC 2 Type II	✅ Certified	Full platform
HIPAA	✅ Compliant	Healthcare ready
GDPR	✅ Compliant	EU data protection
PCI DSS	✅ Compliant	Payment handling

ASR is included in your subscription — no additional per-minute charges for speech recognition.