macOS · VOICE INPUT

SpeakMore

Hold a key, speak naturally, release — your words stream directly into any text field. Cloud AI-powered voice input for macOS with context awareness and zero local dependencies.

VIEW ON GITHUB ↗

SpeakMore is a lightweight macOS menu bar app that converts speech to text using cloud-based multimodal AI models. Unlike traditional speech-to-text tools that chain separate recognition and enhancement stages, SpeakMore sends your audio directly to a multimodal AI model in a single step — the model handles recognition, punctuation, formatting, and domain-specific terminology simultaneously. Text streams in real-time into whatever text field you're focused on, whether it's a code editor, email client, chat app, or document. With context awareness that captures your current app, window title, and document path, plus short-term and long-term memory of your speech patterns, SpeakMore adapts to your vocabulary and writing style over time.

CORE FEATURES

Built for Natural Voice Input

One-Step AI Pipeline

Audio goes directly to a multimodal AI model — no separate speech recognition and text enhancement stages. The model handles transcription, punctuation, formatting, and domain-specific terminology in a single pass for faster, more accurate results.

Real-Time Streaming Insertion

Text streams directly into the focused text field as it's generated — via macOS Accessibility API, keyboard simulation, or clipboard injection. Three insertion methods with automatic fallback ensure text arrives in any application.

Context-Aware Transcription

SpeakMore captures your current app name, window title, and document path before each transcription. This real-time context helps the AI model produce more accurate, contextually appropriate text.

Short & Long-Term Memory

After 10 utterances, SpeakMore builds a short-term context snapshot with your recent topics and vocabulary. Daily profiles capture your identity, primary domains, language habits, and frequently used proper nouns for increasingly personalized transcription.

Custom Prompts & Terminology

Set global or per-app transcription instructions — from faithful transcription to structured output. Add a priority terminology list for proper nouns, technical terms, and domain-specific vocabulary that the AI should always get right.

Multi-Provider Support

Choose from Google Gemini, DashScope (Qwen), OpenRouter, or any OpenAI-compatible endpoint. Bring your own API key — no subscription, no vendor lock-in. Switch models with a single click.

Zero Local Dependencies

No model downloads, no local GPU required. The entire app is ~2MB. Pure cloud-based architecture means you always get the latest AI models without updating the app.

HOW IT WORKS

How SpeakMore Works

Hold

Press and hold the hotkey (default: Fn key). SpeakMore captures your current app context and starts recording from your microphone.

Speak

Speak naturally — the audio is recorded at 16kHz and a real-time equalizer shows your voice level. Say what you mean, in any language.

Release

Release the key. Your audio is sent to the cloud AI model and transcribed text streams directly into the focused text field in real-time.

USE CASES

Who Uses SpeakMore

Developers

Write commit messages, code comments, documentation, and Slack messages by voice. Context awareness ensures technical terms and variable names are transcribed correctly.

Writers & Bloggers

Draft articles, emails, and social media posts at the speed of speech. Long-term memory adapts to your writing style and frequently used phrases.

Professionals

Dictate meeting notes, reports, and project updates directly into any application. Per-app prompts let you tailor output format for different tools.

Multilingual Users

Speak in any language the AI model supports. SpeakMore handles mixed-language input naturally — switch between English and Chinese mid-sentence.

COMPARISON

SpeakMore vs Other Voice Input Tools

Feature	SpeakMore	Traditional Voice Input
Architecture	One-step multimodal AI	Separate STT + enhancement
Context awareness	App, window, document + memory	None
Custom terminology	Per-app prompts + glossary	Limited or none
Model updates	Always latest cloud models	Manual updates required
Local storage	~2MB, no models	Gigabytes of models
Open source	MIT license	Proprietary

~2MB App Size

4+ AI Providers

MIT Open Source License

FAQ

Frequently Asked Questions

What is SpeakMore?

SpeakMore is a free, open-source macOS menu bar app that converts speech to text using cloud-based multimodal AI models. It sends audio directly to AI for one-step transcription and streams the result into any text field on your Mac.

How is SpeakMore different from macOS built-in dictation?

SpeakMore uses multimodal AI models (like Gemini) instead of traditional speech recognition. It understands context — your current app, document, and conversation history — to produce more accurate, naturally formatted text. It also supports custom prompts and terminology lists.

Which AI providers does SpeakMore support?

SpeakMore supports Google Gemini (native API), DashScope/Qwen, OpenRouter, and any custom OpenAI-compatible endpoint. You bring your own API key — there is no subscription fee beyond the API cost.

Does SpeakMore work offline?

No. SpeakMore is a cloud-based tool that requires an internet connection and an API key. The trade-off is zero local model downloads (~2MB total app size) and access to the latest AI models.

What languages does SpeakMore support?

SpeakMore supports any language that the underlying AI model supports. With Gemini, this includes English, Chinese, Japanese, Korean, Spanish, French, German, and dozens more. Mixed-language input is handled naturally.

Is SpeakMore free?

Yes. SpeakMore is free and open source under the MIT license. You only pay for the cloud AI API usage, which is typically very low cost (a few cents per day for moderate use).

What macOS version is required?

SpeakMore requires macOS 14.0 (Sonoma) or later. It also needs Accessibility permission (for text insertion) and Microphone permission.

How does SpeakMore insert text into applications?

SpeakMore uses a three-tier approach: first it tries direct text insertion via the macOS Accessibility API, then keyboard simulation, then clipboard injection as a fallback. This ensures text arrives in any application.

Key Takeaways

SpeakMore is a free, open-source macOS menu bar app for cloud AI-powered voice input by ByuTech
One-step pipeline: audio goes directly to a multimodal AI model (Gemini, Qwen, etc.) for transcription — no separate STT stage
Real-time streaming text insertion into any text field via Accessibility API, keyboard simulation, or clipboard fallback
Context-aware: captures current app, window title, document path, plus short-term and long-term memory of speech patterns
Supports custom prompts and terminology lists — global or per-app — for domain-specific accuracy
Multi-provider support: Google Gemini, DashScope, OpenRouter, or any OpenAI-compatible endpoint with BYOAPI
Zero local dependencies — ~2MB app, no model downloads, pure cloud architecture
Requires macOS 14.0+, MIT licensed, source code at github.com/Maxwin-z/SpeakMore-macOS

Start Speaking, Start Writing

SpeakMore is free and open source. Clone from GitHub, build with Xcode, and start dictating.

VIEW ON GITHUB ↗