Documentation

Quick Start Guide

1

Purchase a License

Create an account and purchase a Backend Server, Frontend Client, or Bundle license from the portal.

2

Install the Backend Server

Download and install the backend on a machine with an NVIDIA GPU. Enter your license key when prompted.

3

Install the Frontend Client

Download and install the desktop client on any Windows machine with network access to the backend server.

4

Start Transcribing

Point the client at your backend server, add files, and start batch transcription.

Backend Server Setup

Prerequisites

  • NVIDIA GPU with 8GB+ VRAM and CUDA 11.8 or 12.x installed
  • Python 3.10 or newer
  • Windows 10/11 or Ubuntu 22.04+
  • 16GB RAM minimum

Option 1: Automated Installer (Recommended)

The installer handles everything — Python venv, dependencies, license activation, and service setup.

# Linux
sudo bash install.sh

# Windows (PowerShell as Administrator)
powershell -ExecutionPolicy Bypass -File install.ps1

The installer will:

  1. Check prerequisites (Python 3.10+, GPU, ffmpeg)
  2. Create a Python virtual environment and install dependencies
  3. Prompt for your license key and activate it
  4. Create a system service (systemd on Linux, scheduled task on Windows)
  5. Start the server automatically

Option 2: Docker

Requires Docker with NVIDIA Container Toolkit.

# Set your license key
export LICENSE_KEY=SM-XXXX-XXXX-XXXX-XXXX

# Start with GPU support
docker compose up -d

Option 3: Manual Setup

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # Linux
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

# Copy config template
cp config.yaml.example config.yaml
# Edit config.yaml — set your license key

# Start the server
python run.py

Configuration

Settings are in config.yaml. Key options can also be set via environment variables:

LICENSE_KEY=SM-XXXX-XXXX-XXXX-XXXX    # License key
HUGGINGFACE_TOKEN=hf_xxxxx             # For speaker diarization
WHISPER_MODEL=large-v3                 # Default model
WHISPER_DEVICE=cuda                    # cuda or cpu

The server validates your license on startup. With a valid license, all features are unlocked. Without a key, it runs in trial mode.

Frontend Client Setup

The frontend is a lightweight desktop app — it connects to your backend server over the network. No GPU needed on the client machine.

Installation

Download the installer for your platform from your portal dashboard:

  • Windows: SpeakerMap-Setup.exe or .msi
  • macOS: SpeakerMap.dmg
  • Linux: .deb or .AppImage

First Launch

  1. Enter your frontend license key (or skip for trial mode)
  2. Enter your backend server URL (e.g., http://192.168.1.100:8765)
  3. The app connects and you're ready to transcribe

The license is tied to your hardware — it activates automatically and works offline after the initial activation.

Using the Client

  1. Add Files: Drag and drop audio/video files or use the file browser
  2. Select Model: Choose a Whisper model size (tiny, base, small, medium, large-v3)
  3. Configure: Set language, output format, and diarization options
  4. Process: Click "Start" to begin batch transcription
  5. Export: Download completed transcripts in your chosen format

Whisper Models

Model Parameters VRAM Speed Accuracy
tiny 39M ~1 GB ~32x realtime Basic
base 74M ~1 GB ~16x realtime Good
small 244M ~2 GB ~8x realtime Better
medium 769M ~5 GB ~4x realtime Great
large-v3 1550M ~10 GB ~2x realtime Best

Speed is approximate on an RTX 4070 Ti. Actual performance varies with GPU, audio quality, and file length.

Output Formats

TXT (Plain Text)

Simple text output with optional speaker labels. Best for reading and searching.

SRT (SubRip Subtitle)

Standard subtitle format with timestamps. Compatible with video players and editors.

VTT (WebVTT)

Web-native subtitle format. Ideal for web players and streaming platforms.

JSON (Structured Data)

Full metadata including timestamps, confidence scores, speaker IDs, and word-level timing.

License Management

Activating Your License

When you start the backend server or frontend client with a license key, it automatically activates on that machine. Each license has a limited number of activations:

  • Backend Server: 1 activation (1 server)
  • Frontend Client: 1 activation (1 workstation)
  • Bundle: 2 activations (1 server + 1 workstation)

Moving to a New Machine

If you need to move your license to new hardware:

  1. Log into your account at think3d.ca/portal/licenses
  2. Find the license and click "Deactivate" on the old machine
  3. Install Speaker Map on the new machine and enter your license key

No need to contact support — self-service re-registration is available 24/7.

Offline Usage

Speaker Map validates your license online every 7 days. If your server is offline, you have a 30-day grace period before the license needs to re-validate. The software continues working normally during this period.

Supported File Formats

Audio

MP3, WAV, FLAC, OGG, M4A, AAC, WMA, AIFF

Video

MP4, MKV, AVI, MOV, WMV, FLV, WebM

Audio is extracted automatically from video files before transcription.

FAQ

How accurate is the transcription?

Using the large-v3 model, Speaker Map achieves word error rates (WER) comparable to commercial cloud services — typically under 5% for clear English audio. Accuracy improves with higher-quality audio and better GPU resources.

What languages are supported?

Whisper supports 99 languages for transcription and translation to English. Language can be auto-detected or manually specified.

Can multiple users share one backend server?

Yes. One backend server can handle requests from multiple frontend clients. Each frontend client needs its own license, but the backend server only needs one license.

Is my data sent to the cloud?

No. All transcription processing happens locally on your hardware. The only network communication is periodic license validation with our server (a small metadata check — no audio data is transmitted).

What GPU do I need?

Any NVIDIA GPU with 8GB+ VRAM and CUDA support. Recommended: RTX 3060 12GB or better. For the large-v3 model, 10GB+ VRAM is recommended.

Need help? Contact us at [email protected]