Preview for Text-to-Speech for AI Voiceovers
Text-to-Speech for AI Voiceovers workflow diagram

Run this workflow on InstaSD

Get started in minutes! Run this ComfyUI workflow online - no setup required.

Description

Generate realistic AI voiceovers directly inside ComfyUI using this Kokoro TTS Workflow. This setup allows you to create free, offline text-to-speech audio, blending multiple speaker models for custom voices. Ideal for narration, storytelling, and character dialogue generationβ€”all without cloud services.


🎯 Features

  • Free & Local – No online services required; fully runs offline using Kokoro.
  • Multiple Speaker Voices – Includes support for models like am_onyx and am_adam.
  • Voice Blending – Combine two voices via KokoroSpeakerCombiner for unique speech tone.
  • Text-to-Speech Node – KokoroGenerator turns your typed text into audio using the selected voice.
  • Audio Export – Output is saved as a playable .wav file via SaveAudio.

πŸ’‘ Use Cases

  • YouTube Narration – Generate commentary and explanations for videos.
  • Game Development – Create character dialogue with distinct tones.
  • Voiceover for Animation – Narrate animated scenes or intros locally.
  • Storytelling & Audiobooks – Read out long texts with expressive voice control.
  • Virtual Assistants – Add speech to AI bots or desktop assistants.

βš™οΈ How It Works

  1. Load Speakers – Use two KokoroSpeaker nodes to select different voices (e.g. am_onyx, am_adam).
  2. Blend Voices (Optional) – Use KokoroSpeakerCombiner to merge voices with adjustable ratio.
  3. Type Your Text – Input your line into KokoroGenerator, choose speed/language settings.
  4. Generate Speech – Connect the combined speaker into KokoroGenerator to create the voice audio.
  5. Save Audio – Use SaveAudio to export the voiceover as a .wav file.

Credits: pixaroma

Nodes

KokoroGeneratorSaveAudioKokoroSpeakerCombinerKokoroSpeaker