Documentation — Speaker Map

Quick Start Guide

1

Purchase a License

Create an account and purchase a Backend Server, Frontend Client, or Bundle license from the portal.

2

Install the Backend Server

Download and install the backend on a machine with an NVIDIA GPU. Enter your license key when prompted.

3

Install the Frontend Client

Download and install the desktop client on any Windows machine with network access to the backend server.

4

Start Transcribing

Point the client at your backend server, add files, and start batch transcription.

Backend Server Setup

Prerequisites

NVIDIA GPU with 8GB+ VRAM and CUDA 11.8 or 12.x installed
Python 3.10 or newer
Windows 10/11 or Ubuntu 22.04+
16GB RAM minimum

Option 1: Automated Installer (Recommended)

The installer handles everything — Python venv, dependencies, license activation, and service setup.

# Linux
sudo bash install.sh

# Windows (PowerShell as Administrator)
powershell -ExecutionPolicy Bypass -File install.ps1

The installer will:

Check prerequisites (Python 3.10+, GPU, ffmpeg)
Create a Python virtual environment and install dependencies
Prompt for your license key and activate it
Create a system service (systemd on Linux, scheduled task on Windows)
Start the server automatically

Option 2: Docker

Requires Docker with NVIDIA Container Toolkit.

# Set your license key
export LICENSE_KEY=SM-XXXX-XXXX-XXXX-XXXX

# Start with GPU support
docker compose up -d

Option 3: Manual Setup

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # Linux
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

# Copy config template
cp config.yaml.example config.yaml
# Edit config.yaml — set your license key

# Start the server
python run.py

Configuration

Settings are in config.yaml. Key options can also be set via environment variables:

LICENSE_KEY=SM-XXXX-XXXX-XXXX-XXXX    # License key
HUGGINGFACE_TOKEN=hf_xxxxx             # For speaker diarization
WHISPER_MODEL=large-v3                 # Default model
WHISPER_DEVICE=cuda                    # cuda or cpu

The server validates your license on startup. With a valid license, all features are unlocked. Without a key, it runs in trial mode.

Frontend Client Setup

The frontend is a lightweight desktop app — it connects to your backend server over the network. No GPU needed on the client machine.

Installation

Download the installer for your platform from your portal dashboard:

Windows: SpeakerMap-Setup.exe or .msi
macOS: SpeakerMap.dmg
Linux: .deb or .AppImage

First Launch

Enter your frontend license key (or skip for trial mode)
Enter your backend server URL (e.g., http://192.168.1.100:8765)
The app connects and you're ready to transcribe

The license is tied to your hardware — it activates automatically and works offline after the initial activation.

Using the Client

Add Files: Drag and drop audio/video files or use the file browser
Select Model: Choose a Whisper model size (tiny, base, small, medium, large-v3)
Configure: Set language, output format, and diarization options
Process: Click "Start" to begin batch transcription
Export: Download completed transcripts in your chosen format

Editorial Review Workflow

Speaker Map includes a multi-editor document review system. Multiple editors can work on different segments of a transcript simultaneously, with video-synced editing and overlap detection.

Workflow Steps

Create document — Import a transcript from a completed transcription job
Ingest files — Attach audio/video files for synced playback during editing
Editors claim segments — Each editor claims a portion of the transcript to review
Review with video — Edit text while watching/listening to the source media in sync
Submit for review — Mark segments as reviewed when editing is complete
Merge & export — Combine all reviewed segments into the final transcript

Segment Statuses

pending → available → claimed → in_review → reviewed

Overlap detection prevents two editors from claiming overlapping time ranges. The system tracks who edited what and when for full audit trails.

Search & Indexing

Full-text search across all transcripts in your system. Find any word or phrase spoken in any meeting or recording.

Search Features

Full-text search — Search across all transcript content with relevance ranking
Speaker filter — Narrow results to a specific speaker
Date range filter — Search within a specific time period
Category filter — Filter by meeting type or document category
Auto-suggest — Get search suggestions as you type
Speaker listing — Browse all speakers across your transcript library

Reindexing

If search results seem stale, administrators can trigger a full reindex from the backend API:

POST /api/search/reindex

Meeting Management

Create and manage legislative meetings with full attendee tracking, agenda building, roll call, motions, and voting.

Meeting Types

Regular — Scheduled recurring meetings (e.g., monthly council)
Special — Ad-hoc meetings called for specific business
Committee — Subcommittee or working group sessions
Hearing — Public hearings and testimony sessions
Workshop — Informal working sessions

Attendees & Roles

Each meeting has a roster of attendees with assigned roles:

Chair — Presides over the meeting
Vice Chair — Backup for the chair
Member — Voting member
Clerk — Records minutes and manages documents
Guest — Non-voting attendee or presenter

Agenda Items

Build structured agendas with typed items:

Call to Order, Roll Call, Approval of Minutes
Discussion, Action, Presentation, Report
Public Comment, Old Business, New Business
Adjournment

Roll Call

Record attendance for each attendee with statuses: present, absent, excused, late, remote.

Motions & Voting

Record motions during meetings and capture individual votes:

Motion types: main, amend, table, postpone
Vote options: yea, nay, abstain, absent
Automatic tally with pass/fail determination
Each vote records the attendee, their vote, and timestamp

Minutes Export

Generate formal meeting minutes from your meeting data. The export includes all structured meeting information in a professional format.

Template Types

Verbatim — Full transcript with all discussion included
Summary — Condensed overview of key points and decisions
Action Item — Focused on decisions, motions, and action items only

Output Formats

DOCX — Microsoft Word document with formatted headings and tables
PDF — Print-ready document
TXT — Plain text for archival

Generated Content

Each exported document includes:

Title page with meeting details (date, type, location)
Roll call with attendance status for each attendee
Agenda items in order
Motions with mover, seconder, and full vote tallies
Discussion notes (verbatim template)

Captions & Subtitles

Generate closed captions from transcripts for accessibility and compliance. Captions sync with the original audio/video timing.

Supported Formats

VTT (WebVTT) — Standard format for web video players
SRT (SubRip) — Universal subtitle format for desktop players and editors

Configuration Options

Max characters per line — Control line length for readability
Max lines per caption — Limit caption block height
Duration limits — Set minimum and maximum display time
Reading speed — Adjust timing for audience reading speed
Speaker labels — Optionally include speaker names in captions

Use the caption preview to review timing and formatting before exporting the final file.

Archive Portal

Publish meetings to a searchable public archive. Citizens and stakeholders can watch meetings with synchronized transcripts, captions, and index points.

Publishing Workflow

Complete the meeting transcript and editorial review
Click Publish to make the meeting publicly accessible
The meeting appears in the public archive with full search
Use Unpublish to remove a meeting from public view at any time

Public Viewer

The archive viewer provides:

Synced video playback with scrolling transcript
Closed captions overlay on video
Clickable index points to jump to key moments
Full-text search within a meeting

Embeddable Player

Embed the archive player in external websites using an iframe. The embeddable URL is provided for each published meeting.

Archive Search

The public archive search only returns results from published meetings. Unpublished meetings remain private and are not indexed for public search.

Advanced Editor Features

Professional tools for efficient transcript editing, designed for transcriptionists and court reporters.

Waveform Visualization

The editor displays the audio waveform alongside the transcript. Click anywhere on the waveform to seek to that position. The playhead tracks the current position as audio plays.

Playback Speed

Adjust playback speed from 0.5x to 3x. Useful for slowing down fast speech or speeding through clear sections.

Configurable Hotkeys

12 default keyboard shortcuts for common actions, fully rebindable per editor:

Action	Default Shortcut
Play / Pause	F5
Rewind 5s	F3
Forward 5s	F4
Speed Up	F7
Slow Down	F6
Insert Timestamp	F8
Add Index Point	F9
Add Annotation	F10

Foot Pedal Support

Speaker Map supports USB transcription foot pedals via WebHID. Standard 3-button foot pedals are mapped to:

Left pedal — Rewind
Center pedal — Play / Pause
Right pedal — Fast forward

Multi-Track Audio

For multi-channel recordings, the editor provides per-channel controls:

Volume — Adjust each channel independently
Mute — Silence individual channels
Solo — Listen to one channel in isolation

Index Points

Mark key moments on the video timeline. Index point categories:

topic, speaker, bill, motion, vote, agenda

Annotations

Add notes and markers to specific positions in the transcript. Annotation types:

bookmark — Save a position for later
flag — Mark something that needs attention
note — Add a text comment
correction — Mark a needed correction
question — Flag something for clarification

Whisper Models

Model	Parameters	VRAM	Speed	Accuracy
tiny	39M	~1 GB	~32x realtime	Basic
base	74M	~1 GB	~16x realtime	Good
small	244M	~2 GB	~8x realtime	Better
medium	769M	~5 GB	~4x realtime	Great
large-v3	1550M	~10 GB	~2x realtime	Best

Speed is approximate on an RTX 4070 Ti. Actual performance varies with GPU, audio quality, and file length.

Output Formats

TXT (Plain Text)

Simple text output with optional speaker labels. Best for reading and searching.

SRT (SubRip Subtitle)

Standard subtitle format with timestamps. Compatible with video players and editors.

VTT (WebVTT)

Web-native subtitle format. Ideal for web players and streaming platforms.

JSON (Structured Data)

Full metadata including timestamps, confidence scores, speaker IDs, and word-level timing.

License Management

Activating Your License

When you start the backend server or frontend client with a license key, it automatically activates on that machine. Each license has a limited number of activations:

Backend Server: 1 activation (1 server)
Frontend Client: 1 activation (1 workstation)
Bundle: 2 activations (1 server + 1 workstation)

Moving to a New Machine

If you need to move your license to new hardware:

Log into your account at think3d.ca/portal/licenses
Find the license and click "Deactivate" on the old machine
Install Speaker Map on the new machine and enter your license key

No need to contact support — self-service re-registration is available 24/7.

Offline Usage

Speaker Map validates your license online every 7 days. If your server is offline, you have a 30-day grace period before the license needs to re-validate. The software continues working normally during this period.

Supported File Formats

Audio

MP3, WAV, FLAC, OGG, M4A, AAC, WMA, AIFF

Video

MP4, MKV, AVI, MOV, WMV, FLV, WebM

Audio is extracted automatically from video files before transcription.

FAQ

How accurate is the transcription?

Using the large-v3 model, Speaker Map achieves word error rates (WER) comparable to commercial cloud services — typically under 5% for clear English audio. Accuracy improves with higher-quality audio and better GPU resources.

What languages are supported?

Whisper supports 99 languages for transcription and translation to English. Language can be auto-detected or manually specified.

Can multiple users share one backend server?

Yes. One backend server can handle requests from multiple frontend clients. Each frontend client needs its own license, but the backend server only needs one license.

Is my data sent to the cloud?

No. All transcription processing happens locally on your hardware. The only network communication is periodic license validation with our server (a small metadata check — no audio data is transmitted).

What GPU do I need?

Any NVIDIA GPU with 8GB+ VRAM and CUDA support. Recommended: RTX 3060 12GB or better. For the large-v3 model, 10GB+ VRAM is recommended.

Can Speaker Map manage legislative meetings?

Yes. Speaker Map includes a full legislative platform with meeting management, attendee tracking, agenda building, roll call, motions, voting with automatic tallies, and formal minutes export to DOCX/PDF. It's designed for city councils, legislatures, and government bodies.

Does it support foot pedals?

Yes. Speaker Map supports USB transcription foot pedals via WebHID. Standard 3-button pedals are automatically mapped to rewind, play/pause, and fast-forward. No drivers or additional software needed.

Can we publish meetings publicly?

Yes. The archive portal lets you publish meetings with synced video, searchable transcript, closed captions, and index points. Citizens can browse and search published meetings. You can also embed the player on external websites via iframe.

Need help? Contact us at [email protected]