"Why does the exact same script sound so different today than it did yesterday?"
It’s a question every ElevenLabs user asks at least once.
The reality is that 80% of those inconsistencies come down to how you craft your prompts (your input text).
Hi everyone, and welcome to the Sonetho ⚡
At ElevenLabs, a "prompt" isn't some complex code—it’s simply the text you feed the AI to convert into speech.
But here’s the secret: that text is more than just a "blob of words."
Take the phrase, "That is absolutely amazing."
If you paste it in as-is, the AI reads it plainly.
But add one emotional tag, and it performs the line with genuine astonishment.
A single digit, a period, or a hyphen can completely transform the final output.
Having fine-tuned ElevenLabs for nearly three years, we’ve decided to take a deep dive into "Prompt Engineering" just for you.
We’re covering everything: v3 audio tags, pronunciation fixes, pacing, and ready-to-copy templates. Read on!
📌 First: Prompts behave differently depending on the 'model'
Before we jump in, let’s clear one thing up.
The "audio tags" we’re about to discuss—commands like [excited] or [laughs] in brackets—only work on the Eleven v3 model.
What happens if you use these tags in Multilingual v2?
The model won't "act" them out. Because it doesn't recognize them as performance cues, it will treat them as literal text (it might even read the tag out loud or simply ignore it). 😅
(Our team has run side-by-side experiments comparing v2 and v3; you can listen to the differences in our Eleven v3 vs. v2 Comparison.)
The TL;DR:
• Want emotional or sound effect tags? Use v3.
• Need long-form stability? Use v2 (it avoids the tonal drift that can occasionally happen in v3).
• Need pronunciation fixes or specific pacing (hyphens)? These work on both v2 and v3.
For a deeper dive into the pros and cons of each model, check out our User-Proven v2 vs. v3 Guide.
🎭 1. The v3 Audio Tag Cheat Sheet (Emotions & Effects)
Audio tags are straightforward: simply type your performance instruction in English inside brackets [ ].
While the documentation notes that tags influence the delivery of the following text, we’ve found that placing them at the start of a sentence sets the overall tone, while placing them in the middle triggers specific effects like laughter or a sigh.
(There’s no "absolute rule," but this approach generally yields the most natural results.)
Remember these two golden rules:
Use English for tags. (e.g., [whispers] works significantly better than any non-English equivalent.)
Tags are not spoken aloud. They are performance directions, so the AI knows not to narrate the word "whispers."
(This is a v3-specific feature.)
① Emotional Tone Tags (Best at the start of a sentence)
Tag | Effect | Best Used For |
|---|---|---|
[excited] | Upbeat, enthusiastic | Announcements, product launches |
[sad] | Subdued, mournful | Emotional storytelling |
[angry] | Furious, intense | Character dialogue, drama |
[whispers] | Soft, hushed | Secrets, ASMR, suspense |
[sarcastic] | Snarky, witty | Comedy, character personality |
[curious] | Inquisitive tone | Questioning narration |
[nervous] | Anxious, shaky | High-stakes scenes |
[calm] | Stable, soothing | Meditation, tutorials |
Example Input (v3):
[excited] Everyone, the new product is finally here!
[whispers] And here’s a secret: we’ve got a special discount just for today.
👉 The first sentence sounds bright and energetic; the second drops into an intimate whisper. Being able to shift tones mid-script is exactly what makes v3 so powerful.
② Non-Verbal Effect Tags (Best mid-sentence)
These go beyond changing "tone"—they generate actual sounds like laughter, sighs, or clearing your throat.
Tag | Sound | Best Position |
|---|---|---|
[laughs] | Laughter | Middle/End |
[chuckles] | Light, soft laugh | Middle/End |
[sighs] | Heavy breath | Start/Middle |
[gasps] | Sharp intake of breath | Start |
[clears throat] | Throat clearing | Start |
[exhales] | Deep exhale | Middle/End |
Example Input (v3):
Oh, I totally forgot about that. [sighs] I guess I’ll have to start over.
So, here’s what I did yesterday… [laughs] it was such a silly mistake.
📌 Lab Tip: Don't over-tag! ⚡
You can combine tags, but stacking too many in one sentence often leads to instability (weird speed, artifacts) or the AI ignoring them entirely. We recommend sticking to one tag per sentence as a rule of thumb. Less is more!
🔤 2. Pronunciation: When AI gets names or numbers wrong
Even with perfect tone, if your AI reads "SaaS" as "Sass" instead of "S-A-A-S," your content is ruined. 😭 Pronunciation requires a different strategy than emotional tone.
Here are three ways to fix it:
① Phonetic Spelling (The Quick Fix)
Simply spell out the word the way it should sound in your script.
ChatGPT → Chat G P T
API → A P I
$22 → Twenty-two dollars (If numbers are read inconsistently)
This is perfect for one-off scripts.
② Use Hyphens for Separation
If two words bleed together (e.g., "fast-track"), use a hyphen to force a clean break.
fast-track / Chat-GPT / A-P-I
Hyphens act as both a pronunciation spacer and a subtle, short breath.
③ Pronunciation Dictionary (The Pro Move)
If you have specific brand names or industry terms you use constantly, don't waste time re-spelling them every time. Add them to your Pronunciation Dictionary once, and ElevenLabs will automatically handle them for every project.
🚨 Common Pitfall (Save your credits!)
In the dictionary, the Alias should be the phonetic sound. For the clearest English pronunciation, use the phonetic English spelling. ✅ e.g., SaaS → S A A S
※ Important: Do not use hyphens in the Alias field. Just use the phonetic spelling (e.g., "S A A S"). Reserve hyphens for pacing within your main script text.
Check out our Complete Pronunciation Guide for a step-by-step walkthrough on building your dictionary.
Note: v3 is much better at handling patterns like "$22" → "twenty-two dollars" than v2. However, it can sometimes oscillate between American and British pronunciations for the same word. If you need 100% consistency, the dictionary is your best friend.
⏸️ 3. Pacing & Breathing: The Magic of Punctuation
Punctuation is your most powerful tool for controlling flow. The AI interprets pauses, stops, and transitions based on these markers.
① Understanding Pause Lengths
Punctuation | Effect | Pause Length |
|---|---|---|
Comma , | Short, smooth break | Very short |
Period . | Sentence end, tone drop | Medium |
Line Break (Enter) | Paragraph break, reset | Long |
Question Mark ? | Upward inflection | Medium |
Exclamation ! | Emphasis | Medium |
Ellipsis … | Lingering, hesitation | Long/Soft |
Hyphen - | Micro-break | Very short |
② The Hyphen (-) — Our Secret Weapon
When a comma feels too long, but you don't want words to run together, the hyphen is your perfect bridge.
Fixing Blending: fast-track (prevents words from merging)
Micro-Breaths: Well, let’s begin- (creates a natural pause before a shift)
Building Suspense: It is-quite-unbelievable.
📌 v3 Note: Line breaks are a double-edged sword.
v3 re-samples the voice profile slightly at every paragraph break. If you need long-form consistency, keep your paragraphs long. If you want distinct mood changes, use line breaks to your advantage.
③ Precise Pauses — They differ by model! ⚡
When you need an exact 1-second pause, methods differ based on your engine:
Situation | Method | Notes |
|---|---|---|
v3 Model | [pause] · [short pause] · [long pause] | v3 exclusive tags |
v2 · Turbo · Flash | <break time="1.5s" /> | SSML; not supported in v3 |
Studio Editor | Use the 'Insert Pause' button | No manual typing required |
🚨 Common Mistake: Using <break> in v3.
v3 does not support SSML tags and will simply ignore them. Use [pause] for v3. Conversely, if you use [pause] in v2, it won't work—use the <break> tag instead.
📋 4. Practice Templates (Copy & Paste)
① YouTube Informational Narration
[curious] Did you know about this?
What I’m about to show you is-surprisingly simple, yet effective.
[excited] Let’s get started!
Tip: Hook the audience with [curious], and build tension with [excited]. Use hyphens for natural micro-pauses.
② Character Dialogue
[whispers] You mustn't tell anyone…
[nervous] The truth is, I was there that night.
[sighs] I suppose it’s too late for regrets.
Tip: Short sentences paired with emotional tags are the v3 "sweet spot." Perfect for gaming dubs or animation.
③ Product Ads (15s)
[excited] One day only! The price drops now.
Don’t wait-or you might miss out.
[calm] Make a choice that’s smarter than the rest.
Tip: High energy → Transition → Calm resolution.
④ Calm Guide / Instructions
Hello, customer.
I will now guide you through the setup, step-by-step.
First, press and hold the power button for three seconds.
Tip: No tags are needed here—clean punctuation is key. Use v2 for long, stable instructional content.
✅ Final Checklist
Using emotion/effect tags? → Are you using v3?
Used English for tags and kept to 1 tag per sentence?
Fixed pronunciation issues with phonetic spelling or hyphens?
Repeating brands? → Registered in the Pronunciation Dictionary?
Audio inconsistent? → Reduced line breaks or switched to v2?
Previewed the audio? (Always listen, don't just read!)
❓ FAQ
Q. I used [excited] but it just read out the word "excited." Why?
You are likely using v2. Audio tags work only on v3. Switch your model and try again.
Q. Can I use other languages for tags?
Stick to English (e.g., [whispers]). The v3 model is trained on these specific English tags to ensure the best performance.
Q. How do I fix numbers being read incorrectly?
Spell them out (e.g., "Twenty-two"). For recurring technical terms, use the Pronunciation Dictionary.
Q. How do I keep the voice consistent for long scripts?
v3 can have subtle tonal shifts between paragraphs. For absolute consistency, minimize line breaks or switch to v2.
🎁 Closing Thoughts
ElevenLabs is all about "how you train the engine."
The same text can be a dry reading or a high-end performance depending on your prompt. Take these templates, test them out, and breathe life into your projects. You’ll be surprised at how much difference these small tweaks make!
Stay tuned for more tips in our next post.
This has been Sonetho ⚡