Text-to-Speech for AI Voiceovers - Workflow

Generate realistic AI voiceovers directly inside ComfyUI using this Kokoro TTS Workflow. This setup allows you to create free, offline text-to-speech audio, blending multiple speaker models for custom voices. Ideal for narration, storytelling, and character dialogue generation—all without cloud services.

🎯 Features

Free & Local – No online services required; fully runs offline using Kokoro.
Multiple Speaker Voices – Includes support for models like am_onyx and am_adam.
Voice Blending – Combine two voices via KokoroSpeakerCombiner for unique speech tone.
Text-to-Speech Node – KokoroGenerator turns your typed text into audio using the selected voice.
Audio Export – Output is saved as a playable .wav file via SaveAudio.

💡 Use Cases

YouTube Narration – Generate commentary and explanations for videos.
Game Development – Create character dialogue with distinct tones.
Voiceover for Animation – Narrate animated scenes or intros locally.
Storytelling & Audiobooks – Read out long texts with expressive voice control.
Virtual Assistants – Add speech to AI bots or desktop assistants.

⚙️ How It Works

Load Speakers – Use two KokoroSpeaker nodes to select different voices (e.g. am_onyx, am_adam).
Blend Voices (Optional) – Use KokoroSpeakerCombiner to merge voices with adjustable ratio.
Type Your Text – Input your line into KokoroGenerator, choose speed/language settings.
Generate Speech – Connect the combined speaker into KokoroGenerator to create the voice audio.
Save Audio – Use SaveAudio to export the voiceover as a .wav file.

Credits: pixaroma

Run this workflow on InstaSD

Description

🎯 Features

💡 Use Cases

⚙️ How It Works

Nodes