Documentation

Quick Start Guide

1

Purchase a License

Create an account and purchase a Backend Server, Frontend Client, or Bundle license from the portal.

2

Install the Backend Server

Download and install the backend on a machine with an NVIDIA GPU. Enter your license key when prompted.

3

Install the Frontend Client

Download and install the desktop client on any Windows machine with network access to the backend server.

4

Start Transcribing

Point the client at your backend server, add files, and start batch transcription.

Backend Server Setup

Prerequisites

  • NVIDIA GPU with 8GB+ VRAM and CUDA 11.8 or 12.x installed
  • Python 3.10 or newer
  • Windows 10/11 or Ubuntu 22.04+
  • 16GB RAM minimum

Option 1: Automated Installer (Recommended)

The installer handles everything — Python venv, dependencies, license activation, and service setup.

# Linux
sudo bash install.sh

# Windows (PowerShell as Administrator)
powershell -ExecutionPolicy Bypass -File install.ps1

The installer will:

  1. Check prerequisites (Python 3.10+, GPU, ffmpeg)
  2. Create a Python virtual environment and install dependencies
  3. Prompt for your license key and activate it
  4. Create a system service (systemd on Linux, scheduled task on Windows)
  5. Start the server automatically

Option 2: Docker

Requires Docker with NVIDIA Container Toolkit.

# Set your license key
export LICENSE_KEY=SM-XXXX-XXXX-XXXX-XXXX

# Start with GPU support
docker compose up -d

Option 3: Manual Setup

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # Linux
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

# Copy config template
cp config.yaml.example config.yaml
# Edit config.yaml — set your license key

# Start the server
python run.py

Configuration

Settings are in config.yaml. Key options can also be set via environment variables:

LICENSE_KEY=SM-XXXX-XXXX-XXXX-XXXX    # License key
HUGGINGFACE_TOKEN=hf_xxxxx             # For speaker diarization
WHISPER_MODEL=large-v3                 # Default model
WHISPER_DEVICE=cuda                    # cuda or cpu

The server validates your license on startup. With a valid license, all features are unlocked. Without a key, it runs in trial mode.

Frontend Client Setup

The frontend is a lightweight desktop app — it connects to your backend server over the network. No GPU needed on the client machine.

Installation

Download the installer for your platform from your portal dashboard:

  • Windows: SpeakerMap-Setup.exe or .msi
  • macOS: SpeakerMap.dmg
  • Linux: .deb or .AppImage

First Launch

  1. Enter your frontend license key (or skip for trial mode)
  2. Enter your backend server URL (e.g., http://192.168.1.100:8765)
  3. The app connects and you're ready to transcribe

The license is tied to your hardware — it activates automatically and works offline after the initial activation.

Using the Client

  1. Add Files: Drag and drop audio/video files or use the file browser
  2. Select Model: Choose a Whisper model size (tiny, base, small, medium, large-v3)
  3. Configure: Set language, output format, and diarization options
  4. Process: Click "Start" to begin batch transcription
  5. Export: Download completed transcripts in your chosen format

Editorial Review Workflow

Speaker Map includes a multi-editor document review system. Multiple editors can work on different segments of a transcript simultaneously, with video-synced editing and overlap detection.

Workflow Steps

  1. Create document — Import a transcript from a completed transcription job
  2. Ingest files — Attach audio/video files for synced playback during editing
  3. Editors claim segments — Each editor claims a portion of the transcript to review
  4. Review with video — Edit text while watching/listening to the source media in sync
  5. Submit for review — Mark segments as reviewed when editing is complete
  6. Merge & export — Combine all reviewed segments into the final transcript

Segment Statuses

pending → available → claimed → in_review → reviewed

Overlap detection prevents two editors from claiming overlapping time ranges. The system tracks who edited what and when for full audit trails.

Search & Indexing

Full-text search across all transcripts in your system. Find any word or phrase spoken in any meeting or recording.

Search Features

  • Full-text search — Search across all transcript content with relevance ranking
  • Speaker filter — Narrow results to a specific speaker
  • Date range filter — Search within a specific time period
  • Category filter — Filter by meeting type or document category
  • Auto-suggest — Get search suggestions as you type
  • Speaker listing — Browse all speakers across your transcript library

Reindexing

If search results seem stale, administrators can trigger a full reindex from the backend API:

POST /api/search/reindex

Meeting Management

Create and manage legislative meetings with full attendee tracking, agenda building, roll call, motions, and voting.

Meeting Types

  • Regular — Scheduled recurring meetings (e.g., monthly council)
  • Special — Ad-hoc meetings called for specific business
  • Committee — Subcommittee or working group sessions
  • Hearing — Public hearings and testimony sessions
  • Workshop — Informal working sessions

Attendees & Roles

Each meeting has a roster of attendees with assigned roles:

  • Chair — Presides over the meeting
  • Vice Chair — Backup for the chair
  • Member — Voting member
  • Clerk — Records minutes and manages documents
  • Guest — Non-voting attendee or presenter

Agenda Items

Build structured agendas with typed items:

  • Call to Order, Roll Call, Approval of Minutes
  • Discussion, Action, Presentation, Report
  • Public Comment, Old Business, New Business
  • Adjournment

Roll Call

Record attendance for each attendee with statuses: present, absent, excused, late, remote.

Motions & Voting

Record motions during meetings and capture individual votes:

  • Motion types: main, amend, table, postpone
  • Vote options: yea, nay, abstain, absent
  • Automatic tally with pass/fail determination
  • Each vote records the attendee, their vote, and timestamp

Minutes Export

Generate formal meeting minutes from your meeting data. The export includes all structured meeting information in a professional format.

Template Types

  • Verbatim — Full transcript with all discussion included
  • Summary — Condensed overview of key points and decisions
  • Action Item — Focused on decisions, motions, and action items only

Output Formats

  • DOCX — Microsoft Word document with formatted headings and tables
  • PDF — Print-ready document
  • TXT — Plain text for archival

Generated Content

Each exported document includes:

  • Title page with meeting details (date, type, location)
  • Roll call with attendance status for each attendee
  • Agenda items in order
  • Motions with mover, seconder, and full vote tallies
  • Discussion notes (verbatim template)

Captions & Subtitles

Generate closed captions from transcripts for accessibility and compliance. Captions sync with the original audio/video timing.

Supported Formats

  • VTT (WebVTT) — Standard format for web video players
  • SRT (SubRip) — Universal subtitle format for desktop players and editors

Configuration Options

  • Max characters per line — Control line length for readability
  • Max lines per caption — Limit caption block height
  • Duration limits — Set minimum and maximum display time
  • Reading speed — Adjust timing for audience reading speed
  • Speaker labels — Optionally include speaker names in captions

Use the caption preview to review timing and formatting before exporting the final file.

Archive Portal

Publish meetings to a searchable public archive. Citizens and stakeholders can watch meetings with synchronized transcripts, captions, and index points.

Publishing Workflow

  1. Complete the meeting transcript and editorial review
  2. Click Publish to make the meeting publicly accessible
  3. The meeting appears in the public archive with full search
  4. Use Unpublish to remove a meeting from public view at any time

Public Viewer

The archive viewer provides:

  • Synced video playback with scrolling transcript
  • Closed captions overlay on video
  • Clickable index points to jump to key moments
  • Full-text search within a meeting

Embeddable Player

Embed the archive player in external websites using an iframe. The embeddable URL is provided for each published meeting.

Archive Search

The public archive search only returns results from published meetings. Unpublished meetings remain private and are not indexed for public search.

Advanced Editor Features

Professional tools for efficient transcript editing, designed for transcriptionists and court reporters.

Waveform Visualization

The editor displays the audio waveform alongside the transcript. Click anywhere on the waveform to seek to that position. The playhead tracks the current position as audio plays.

Playback Speed

Adjust playback speed from 0.5x to 3x. Useful for slowing down fast speech or speeding through clear sections.

Configurable Hotkeys

12 default keyboard shortcuts for common actions, fully rebindable per editor:

Action Default Shortcut
Play / PauseF5
Rewind 5sF3
Forward 5sF4
Speed UpF7
Slow DownF6
Insert TimestampF8
Add Index PointF9
Add AnnotationF10

Foot Pedal Support

Speaker Map supports USB transcription foot pedals via WebHID. Standard 3-button foot pedals are mapped to:

  • Left pedal — Rewind
  • Center pedal — Play / Pause
  • Right pedal — Fast forward

Multi-Track Audio

For multi-channel recordings, the editor provides per-channel controls:

  • Volume — Adjust each channel independently
  • Mute — Silence individual channels
  • Solo — Listen to one channel in isolation

Index Points

Mark key moments on the video timeline. Index point categories:

  • topic, speaker, bill, motion, vote, agenda

Annotations

Add notes and markers to specific positions in the transcript. Annotation types:

  • bookmark — Save a position for later
  • flag — Mark something that needs attention
  • note — Add a text comment
  • correction — Mark a needed correction
  • question — Flag something for clarification

Whisper Models

Model Parameters VRAM Speed Accuracy
tiny 39M ~1 GB ~32x realtime Basic
base 74M ~1 GB ~16x realtime Good
small 244M ~2 GB ~8x realtime Better
medium 769M ~5 GB ~4x realtime Great
large-v3 1550M ~10 GB ~2x realtime Best

Speed is approximate on an RTX 4070 Ti. Actual performance varies with GPU, audio quality, and file length.

Output Formats

TXT (Plain Text)

Simple text output with optional speaker labels. Best for reading and searching.

SRT (SubRip Subtitle)

Standard subtitle format with timestamps. Compatible with video players and editors.

VTT (WebVTT)

Web-native subtitle format. Ideal for web players and streaming platforms.

JSON (Structured Data)

Full metadata including timestamps, confidence scores, speaker IDs, and word-level timing.

License Management

Activating Your License

When you start the backend server or frontend client with a license key, it automatically activates on that machine. Each license has a limited number of activations:

  • Backend Server: 1 activation (1 server)
  • Frontend Client: 1 activation (1 workstation)
  • Bundle: 2 activations (1 server + 1 workstation)

Moving to a New Machine

If you need to move your license to new hardware:

  1. Log into your account at think3d.ca/portal/licenses
  2. Find the license and click "Deactivate" on the old machine
  3. Install Speaker Map on the new machine and enter your license key

No need to contact support — self-service re-registration is available 24/7.

Offline Usage

Speaker Map validates your license online every 7 days. If your server is offline, you have a 30-day grace period before the license needs to re-validate. The software continues working normally during this period.

Supported File Formats

Audio

MP3, WAV, FLAC, OGG, M4A, AAC, WMA, AIFF

Video

MP4, MKV, AVI, MOV, WMV, FLV, WebM

Audio is extracted automatically from video files before transcription.

FAQ

How accurate is the transcription?

Using the large-v3 model, Speaker Map achieves word error rates (WER) comparable to commercial cloud services — typically under 5% for clear English audio. Accuracy improves with higher-quality audio and better GPU resources.

What languages are supported?

Whisper supports 99 languages for transcription and translation to English. Language can be auto-detected or manually specified.

Can multiple users share one backend server?

Yes. One backend server can handle requests from multiple frontend clients. Each frontend client needs its own license, but the backend server only needs one license.

Is my data sent to the cloud?

No. All transcription processing happens locally on your hardware. The only network communication is periodic license validation with our server (a small metadata check — no audio data is transmitted).

What GPU do I need?

Any NVIDIA GPU with 8GB+ VRAM and CUDA support. Recommended: RTX 3060 12GB or better. For the large-v3 model, 10GB+ VRAM is recommended.

Can Speaker Map manage legislative meetings?

Yes. Speaker Map includes a full legislative platform with meeting management, attendee tracking, agenda building, roll call, motions, voting with automatic tallies, and formal minutes export to DOCX/PDF. It's designed for city councils, legislatures, and government bodies.

Does it support foot pedals?

Yes. Speaker Map supports USB transcription foot pedals via WebHID. Standard 3-button pedals are automatically mapped to rewind, play/pause, and fast-forward. No drivers or additional software needed.

Can we publish meetings publicly?

Yes. The archive portal lets you publish meetings with synced video, searchable transcript, closed captions, and index points. Citizens can browse and search published meetings. You can also embed the player on external websites via iframe.

Need help? Contact us at [email protected]