Batch audio and video transcription with speaker diarization. Process hundreds of files with GPU-accelerated AI models.
Why Speaker Map?
GPU-Accelerated
Leverage NVIDIA GPUs for fast transcription. Process hours of audio in minutes with CUDA-optimized Whisper models.
Speaker Diarization
Automatically identify and label different speakers. Know who said what with speaker-aware transcription.
Batch Processing
Queue and process hundreds of audio and video files. Supports MP3, WAV, MP4, MKV, and more.
On-Premise
Run on your own hardware. Your data never leaves your network. Full control over your transcription pipeline.
Client-Server
Backend server processes transcriptions. Desktop client for easy file management. Multiple users, one server.
Multiple Formats
Export transcripts as TXT, SRT, VTT, or JSON. Timestamps, speaker labels, and confidence scores included.
Pricing
One-time purchase. No subscriptions for the software.
Backend Server
$400 CAD
- ✓ GPU-accelerated transcription
- ✓ All Whisper model sizes
- ✓ Speaker diarization
- ✓ Unlimited file processing
- ✓ REST API access
- ✓ 1 server license
Frontend Client
$100 CAD / seat
- ✓ Desktop application
- ✓ Drag-and-drop file management
- ✓ Batch queue management
- ✓ Real-time progress tracking
- ✓ Export to TXT, SRT, VTT, JSON
- ✓ 1 seat license
Bundle
$500 CAD
- ✓ Backend Server license
- ✓ Frontend Client license
- ✓ All features included
- ✓ Save $100 over separate
- ✓ Priority setup support
- ✓ 1 server + 1 seat
System Requirements
Backend Server
- OS: Windows 10/11, Ubuntu 22.04+
- GPU: NVIDIA GPU with 8GB+ VRAM (RTX 3060 or better)
- RAM: 16GB minimum
- Storage: 10GB for models + space for audio files
- CUDA: 11.8 or 12.x
- Python: 3.10+
Frontend Client
- OS: Windows 10/11
- RAM: 4GB minimum
- Storage: 200MB for application
- Network: LAN access to backend server
Ready to get started?
Create an account, purchase a license, and start transcribing in minutes.
Create Account