Documentation
Quick Start Guide
Purchase a License
Create an account and purchase a Backend Server, Frontend Client, or Bundle license from the portal.
Install the Backend Server
Download and install the backend on a machine with an NVIDIA GPU. Enter your license key when prompted.
Install the Frontend Client
Download and install the desktop client on any Windows machine with network access to the backend server.
Start Transcribing
Point the client at your backend server, add files, and start batch transcription.
Backend Server Setup
Prerequisites
- NVIDIA GPU with 8GB+ VRAM and CUDA 11.8 or 12.x installed
- Python 3.10 or newer
- Windows 10/11 or Ubuntu 22.04+
- 16GB RAM minimum
Option 1: Automated Installer (Recommended)
The installer handles everything — Python venv, dependencies, license activation, and service setup.
# Linux sudo bash install.sh # Windows (PowerShell as Administrator) powershell -ExecutionPolicy Bypass -File install.ps1
The installer will:
- Check prerequisites (Python 3.10+, GPU, ffmpeg)
- Create a Python virtual environment and install dependencies
- Prompt for your license key and activate it
- Create a system service (systemd on Linux, scheduled task on Windows)
- Start the server automatically
Option 2: Docker
Requires Docker with NVIDIA Container Toolkit.
# Set your license key export LICENSE_KEY=SM-XXXX-XXXX-XXXX-XXXX # Start with GPU support docker compose up -d
Option 3: Manual Setup
# Create virtual environment python3 -m venv .venv source .venv/bin/activate # Linux # .venv\Scripts\activate # Windows # Install dependencies pip install -r requirements.txt # Copy config template cp config.yaml.example config.yaml # Edit config.yaml — set your license key # Start the server python run.py
Configuration
Settings are in config.yaml. Key options can also be set via environment variables:
LICENSE_KEY=SM-XXXX-XXXX-XXXX-XXXX # License key HUGGINGFACE_TOKEN=hf_xxxxx # For speaker diarization WHISPER_MODEL=large-v3 # Default model WHISPER_DEVICE=cuda # cuda or cpu
The server validates your license on startup. With a valid license, all features are unlocked. Without a key, it runs in trial mode.
Frontend Client Setup
The frontend is a lightweight desktop app — it connects to your backend server over the network. No GPU needed on the client machine.
Installation
Download the installer for your platform from your portal dashboard:
- Windows:
SpeakerMap-Setup.exeor.msi - macOS:
SpeakerMap.dmg - Linux:
.debor.AppImage
First Launch
- Enter your frontend license key (or skip for trial mode)
- Enter your backend server URL (e.g.,
http://192.168.1.100:8765) - The app connects and you're ready to transcribe
The license is tied to your hardware — it activates automatically and works offline after the initial activation.
Using the Client
- Add Files: Drag and drop audio/video files or use the file browser
- Select Model: Choose a Whisper model size (tiny, base, small, medium, large-v3)
- Configure: Set language, output format, and diarization options
- Process: Click "Start" to begin batch transcription
- Export: Download completed transcripts in your chosen format
Whisper Models
| Model | Parameters | VRAM | Speed | Accuracy |
|---|---|---|---|---|
| tiny | 39M | ~1 GB | ~32x realtime | Basic |
| base | 74M | ~1 GB | ~16x realtime | Good |
| small | 244M | ~2 GB | ~8x realtime | Better |
| medium | 769M | ~5 GB | ~4x realtime | Great |
| large-v3 | 1550M | ~10 GB | ~2x realtime | Best |
Speed is approximate on an RTX 4070 Ti. Actual performance varies with GPU, audio quality, and file length.
Output Formats
TXT (Plain Text)
Simple text output with optional speaker labels. Best for reading and searching.
SRT (SubRip Subtitle)
Standard subtitle format with timestamps. Compatible with video players and editors.
VTT (WebVTT)
Web-native subtitle format. Ideal for web players and streaming platforms.
JSON (Structured Data)
Full metadata including timestamps, confidence scores, speaker IDs, and word-level timing.
License Management
Activating Your License
When you start the backend server or frontend client with a license key, it automatically activates on that machine. Each license has a limited number of activations:
- Backend Server: 1 activation (1 server)
- Frontend Client: 1 activation (1 workstation)
- Bundle: 2 activations (1 server + 1 workstation)
Moving to a New Machine
If you need to move your license to new hardware:
- Log into your account at think3d.ca/portal/licenses
- Find the license and click "Deactivate" on the old machine
- Install Speaker Map on the new machine and enter your license key
No need to contact support — self-service re-registration is available 24/7.
Offline Usage
Speaker Map validates your license online every 7 days. If your server is offline, you have a 30-day grace period before the license needs to re-validate. The software continues working normally during this period.
Supported File Formats
Audio
MP3, WAV, FLAC, OGG, M4A, AAC, WMA, AIFF
Video
MP4, MKV, AVI, MOV, WMV, FLV, WebM
Audio is extracted automatically from video files before transcription.
FAQ
How accurate is the transcription?
Using the large-v3 model, Speaker Map achieves word error rates (WER) comparable to commercial cloud services — typically under 5% for clear English audio. Accuracy improves with higher-quality audio and better GPU resources.
What languages are supported?
Whisper supports 99 languages for transcription and translation to English. Language can be auto-detected or manually specified.
Can multiple users share one backend server?
Yes. One backend server can handle requests from multiple frontend clients. Each frontend client needs its own license, but the backend server only needs one license.
Is my data sent to the cloud?
No. All transcription processing happens locally on your hardware. The only network communication is periodic license validation with our server (a small metadata check — no audio data is transmitted).
What GPU do I need?
Any NVIDIA GPU with 8GB+ VRAM and CUDA support. Recommended: RTX 3060 12GB or better. For the large-v3 model, 10GB+ VRAM is recommended.
Need help? Contact us at [email protected]