Documentation
Quick Start Guide
Purchase a License
Create an account and purchase a Backend Server, Frontend Client, or Bundle license from the portal.
Install the Backend Server
Download and install the backend on a machine with an NVIDIA GPU. Enter your license key when prompted.
Install the Frontend Client
Download and install the desktop client on any Windows machine with network access to the backend server.
Start Transcribing
Point the client at your backend server, add files, and start batch transcription.
Backend Server Setup
Prerequisites
- NVIDIA GPU with 8GB+ VRAM and CUDA 11.8 or 12.x installed
- Python 3.10 or newer
- Windows 10/11 or Ubuntu 22.04+
- 16GB RAM minimum
Option 1: Automated Installer (Recommended)
The installer handles everything — Python venv, dependencies, license activation, and service setup.
# Linux sudo bash install.sh # Windows (PowerShell as Administrator) powershell -ExecutionPolicy Bypass -File install.ps1
The installer will:
- Check prerequisites (Python 3.10+, GPU, ffmpeg)
- Create a Python virtual environment and install dependencies
- Prompt for your license key and activate it
- Create a system service (systemd on Linux, scheduled task on Windows)
- Start the server automatically
Option 2: Docker
Requires Docker with NVIDIA Container Toolkit.
# Set your license key export LICENSE_KEY=SM-XXXX-XXXX-XXXX-XXXX # Start with GPU support docker compose up -d
Option 3: Manual Setup
# Create virtual environment python3 -m venv .venv source .venv/bin/activate # Linux # .venv\Scripts\activate # Windows # Install dependencies pip install -r requirements.txt # Copy config template cp config.yaml.example config.yaml # Edit config.yaml — set your license key # Start the server python run.py
Configuration
Settings are in config.yaml. Key options can also be set via environment variables:
LICENSE_KEY=SM-XXXX-XXXX-XXXX-XXXX # License key HUGGINGFACE_TOKEN=hf_xxxxx # For speaker diarization WHISPER_MODEL=large-v3 # Default model WHISPER_DEVICE=cuda # cuda or cpu
The server validates your license on startup. With a valid license, all features are unlocked. Without a key, it runs in trial mode.
Frontend Client Setup
The frontend is a lightweight desktop app — it connects to your backend server over the network. No GPU needed on the client machine.
Installation
Download the installer for your platform from your portal dashboard:
- Windows:
SpeakerMap-Setup.exeor.msi - macOS:
SpeakerMap.dmg - Linux:
.debor.AppImage
First Launch
- Enter your frontend license key (or skip for trial mode)
- Enter your backend server URL (e.g.,
http://192.168.1.100:8765) - The app connects and you're ready to transcribe
The license is tied to your hardware — it activates automatically and works offline after the initial activation.
Using the Client
- Add Files: Drag and drop audio/video files or use the file browser
- Select Model: Choose a Whisper model size (tiny, base, small, medium, large-v3)
- Configure: Set language, output format, and diarization options
- Process: Click "Start" to begin batch transcription
- Export: Download completed transcripts in your chosen format
Editorial Review Workflow
Speaker Map includes a multi-editor document review system. Multiple editors can work on different segments of a transcript simultaneously, with video-synced editing and overlap detection.
Workflow Steps
- Create document — Import a transcript from a completed transcription job
- Ingest files — Attach audio/video files for synced playback during editing
- Editors claim segments — Each editor claims a portion of the transcript to review
- Review with video — Edit text while watching/listening to the source media in sync
- Submit for review — Mark segments as reviewed when editing is complete
- Merge & export — Combine all reviewed segments into the final transcript
Segment Statuses
pending → available → claimed → in_review → reviewed
Overlap detection prevents two editors from claiming overlapping time ranges. The system tracks who edited what and when for full audit trails.
Search & Indexing
Full-text search across all transcripts in your system. Find any word or phrase spoken in any meeting or recording.
Search Features
- Full-text search — Search across all transcript content with relevance ranking
- Speaker filter — Narrow results to a specific speaker
- Date range filter — Search within a specific time period
- Category filter — Filter by meeting type or document category
- Auto-suggest — Get search suggestions as you type
- Speaker listing — Browse all speakers across your transcript library
Reindexing
If search results seem stale, administrators can trigger a full reindex from the backend API:
POST /api/search/reindex
Meeting Management
Create and manage legislative meetings with full attendee tracking, agenda building, roll call, motions, and voting.
Meeting Types
- Regular — Scheduled recurring meetings (e.g., monthly council)
- Special — Ad-hoc meetings called for specific business
- Committee — Subcommittee or working group sessions
- Hearing — Public hearings and testimony sessions
- Workshop — Informal working sessions
Attendees & Roles
Each meeting has a roster of attendees with assigned roles:
- Chair — Presides over the meeting
- Vice Chair — Backup for the chair
- Member — Voting member
- Clerk — Records minutes and manages documents
- Guest — Non-voting attendee or presenter
Agenda Items
Build structured agendas with typed items:
- Call to Order, Roll Call, Approval of Minutes
- Discussion, Action, Presentation, Report
- Public Comment, Old Business, New Business
- Adjournment
Roll Call
Record attendance for each attendee with statuses: present, absent, excused, late, remote.
Motions & Voting
Record motions during meetings and capture individual votes:
- Motion types: main, amend, table, postpone
- Vote options: yea, nay, abstain, absent
- Automatic tally with pass/fail determination
- Each vote records the attendee, their vote, and timestamp
Minutes Export
Generate formal meeting minutes from your meeting data. The export includes all structured meeting information in a professional format.
Template Types
- Verbatim — Full transcript with all discussion included
- Summary — Condensed overview of key points and decisions
- Action Item — Focused on decisions, motions, and action items only
Output Formats
- DOCX — Microsoft Word document with formatted headings and tables
- PDF — Print-ready document
- TXT — Plain text for archival
Generated Content
Each exported document includes:
- Title page with meeting details (date, type, location)
- Roll call with attendance status for each attendee
- Agenda items in order
- Motions with mover, seconder, and full vote tallies
- Discussion notes (verbatim template)
Captions & Subtitles
Generate closed captions from transcripts for accessibility and compliance. Captions sync with the original audio/video timing.
Supported Formats
- VTT (WebVTT) — Standard format for web video players
- SRT (SubRip) — Universal subtitle format for desktop players and editors
Configuration Options
- Max characters per line — Control line length for readability
- Max lines per caption — Limit caption block height
- Duration limits — Set minimum and maximum display time
- Reading speed — Adjust timing for audience reading speed
- Speaker labels — Optionally include speaker names in captions
Use the caption preview to review timing and formatting before exporting the final file.
Archive Portal
Publish meetings to a searchable public archive. Citizens and stakeholders can watch meetings with synchronized transcripts, captions, and index points.
Publishing Workflow
- Complete the meeting transcript and editorial review
- Click Publish to make the meeting publicly accessible
- The meeting appears in the public archive with full search
- Use Unpublish to remove a meeting from public view at any time
Public Viewer
The archive viewer provides:
- Synced video playback with scrolling transcript
- Closed captions overlay on video
- Clickable index points to jump to key moments
- Full-text search within a meeting
Embeddable Player
Embed the archive player in external websites using an iframe. The embeddable URL is provided for each published meeting.
Archive Search
The public archive search only returns results from published meetings. Unpublished meetings remain private and are not indexed for public search.
Advanced Editor Features
Professional tools for efficient transcript editing, designed for transcriptionists and court reporters.
Waveform Visualization
The editor displays the audio waveform alongside the transcript. Click anywhere on the waveform to seek to that position. The playhead tracks the current position as audio plays.
Playback Speed
Adjust playback speed from 0.5x to 3x. Useful for slowing down fast speech or speeding through clear sections.
Configurable Hotkeys
12 default keyboard shortcuts for common actions, fully rebindable per editor:
| Action | Default Shortcut |
|---|---|
| Play / Pause | F5 |
| Rewind 5s | F3 |
| Forward 5s | F4 |
| Speed Up | F7 |
| Slow Down | F6 |
| Insert Timestamp | F8 |
| Add Index Point | F9 |
| Add Annotation | F10 |
Foot Pedal Support
Speaker Map supports USB transcription foot pedals via WebHID. Standard 3-button foot pedals are mapped to:
- Left pedal — Rewind
- Center pedal — Play / Pause
- Right pedal — Fast forward
Multi-Track Audio
For multi-channel recordings, the editor provides per-channel controls:
- Volume — Adjust each channel independently
- Mute — Silence individual channels
- Solo — Listen to one channel in isolation
Index Points
Mark key moments on the video timeline. Index point categories:
topic,speaker,bill,motion,vote,agenda
Annotations
Add notes and markers to specific positions in the transcript. Annotation types:
bookmark— Save a position for laterflag— Mark something that needs attentionnote— Add a text commentcorrection— Mark a needed correctionquestion— Flag something for clarification
Whisper Models
| Model | Parameters | VRAM | Speed | Accuracy |
|---|---|---|---|---|
| tiny | 39M | ~1 GB | ~32x realtime | Basic |
| base | 74M | ~1 GB | ~16x realtime | Good |
| small | 244M | ~2 GB | ~8x realtime | Better |
| medium | 769M | ~5 GB | ~4x realtime | Great |
| large-v3 | 1550M | ~10 GB | ~2x realtime | Best |
Speed is approximate on an RTX 4070 Ti. Actual performance varies with GPU, audio quality, and file length.
Output Formats
TXT (Plain Text)
Simple text output with optional speaker labels. Best for reading and searching.
SRT (SubRip Subtitle)
Standard subtitle format with timestamps. Compatible with video players and editors.
VTT (WebVTT)
Web-native subtitle format. Ideal for web players and streaming platforms.
JSON (Structured Data)
Full metadata including timestamps, confidence scores, speaker IDs, and word-level timing.
License Management
Activating Your License
When you start the backend server or frontend client with a license key, it automatically activates on that machine. Each license has a limited number of activations:
- Backend Server: 1 activation (1 server)
- Frontend Client: 1 activation (1 workstation)
- Bundle: 2 activations (1 server + 1 workstation)
Moving to a New Machine
If you need to move your license to new hardware:
- Log into your account at think3d.ca/portal/licenses
- Find the license and click "Deactivate" on the old machine
- Install Speaker Map on the new machine and enter your license key
No need to contact support — self-service re-registration is available 24/7.
Offline Usage
Speaker Map validates your license online every 7 days. If your server is offline, you have a 30-day grace period before the license needs to re-validate. The software continues working normally during this period.
Supported File Formats
Audio
MP3, WAV, FLAC, OGG, M4A, AAC, WMA, AIFF
Video
MP4, MKV, AVI, MOV, WMV, FLV, WebM
Audio is extracted automatically from video files before transcription.
FAQ
How accurate is the transcription?
Using the large-v3 model, Speaker Map achieves word error rates (WER) comparable to commercial cloud services — typically under 5% for clear English audio. Accuracy improves with higher-quality audio and better GPU resources.
What languages are supported?
Whisper supports 99 languages for transcription and translation to English. Language can be auto-detected or manually specified.
Can multiple users share one backend server?
Yes. One backend server can handle requests from multiple frontend clients. Each frontend client needs its own license, but the backend server only needs one license.
Is my data sent to the cloud?
No. All transcription processing happens locally on your hardware. The only network communication is periodic license validation with our server (a small metadata check — no audio data is transmitted).
What GPU do I need?
Any NVIDIA GPU with 8GB+ VRAM and CUDA support. Recommended: RTX 3060 12GB or better. For the large-v3 model, 10GB+ VRAM is recommended.
Can Speaker Map manage legislative meetings?
Yes. Speaker Map includes a full legislative platform with meeting management, attendee tracking, agenda building, roll call, motions, voting with automatic tallies, and formal minutes export to DOCX/PDF. It's designed for city councils, legislatures, and government bodies.
Does it support foot pedals?
Yes. Speaker Map supports USB transcription foot pedals via WebHID. Standard 3-button pedals are automatically mapped to rewind, play/pause, and fast-forward. No drivers or additional software needed.
Can we publish meetings publicly?
Yes. The archive portal lets you publish meetings with synced video, searchable transcript, closed captions, and index points. Citizens can browse and search published meetings. You can also embed the player on external websites via iframe.
Need help? Contact us at [email protected]