Mastering ElevenLabs Prompts: Pro Tips for Perfect Voice AI Results

Unlock the full potential of ElevenLabs with our ultimate guide to voice AI prompting. Learn how to use v3 emotional and sound effect tags, master pronunciation adjustments, and control pacing with precise break tags. From Sarah Johnson-style character narration to professional YouTube and advertising templates, we cover everything you need to optimize your audio. Discover expert strategies to stop struggling with inconsistent outputs and start generating high-quality, natural-sounding voiceovers today.

"Why does the exact same script sound so different today than it did yesterday?"

It’s a question every ElevenLabs user asks at least once.
The reality is that 80% of those inconsistencies come down to how you craft your prompts (your input text).

Hi everyone, and welcome to the Sonetho

At ElevenLabs, a "prompt" isn't some complex code—it’s simply the text you feed the AI to convert into speech.
But here’s the secret: that text is more than just a "blob of words."

 

Take the phrase, "That is absolutely amazing."
If you paste it in as-is, the AI reads it plainly.
But add one emotional tag, and it performs the line with genuine astonishment.
A single digit, a period, or a hyphen can completely transform the final output.

 

Having fine-tuned ElevenLabs for nearly three years, we’ve decided to take a deep dive into "Prompt Engineering" just for you.
We’re covering everything: v3 audio tags, pronunciation fixes, pacing, and ready-to-copy templates. Read on!

 


📌 First: Prompts behave differently depending on the 'model'

Before we jump in, let’s clear one thing up.
The "audio tags" we’re about to discuss—commands like [excited] or [laughs] in brackets—only work on the Eleven v3 model.

 

What happens if you use these tags in Multilingual v2?
The model won't "act" them out. Because it doesn't recognize them as performance cues, it will treat them as literal text (it might even read the tag out loud or simply ignore it). 😅
(Our team has run side-by-side experiments comparing v2 and v3; you can listen to the differences in our Eleven v3 vs. v2 Comparison.)

 

The TL;DR:
Want emotional or sound effect tags? Use v3.
Need long-form stability? Use v2 (it avoids the tonal drift that can occasionally happen in v3).
Need pronunciation fixes or specific pacing (hyphens)? These work on both v2 and v3.

For a deeper dive into the pros and cons of each model, check out our User-Proven v2 vs. v3 Guide.

 


🎭 1. The v3 Audio Tag Cheat Sheet (Emotions & Effects)

Audio tags are straightforward: simply type your performance instruction in English inside brackets [ ].
While the documentation notes that tags influence the delivery of the following text, we’ve found that placing them at the start of a sentence sets the overall tone, while placing them in the middle triggers specific effects like laughter or a sigh.
(There’s no "absolute rule," but this approach generally yields the most natural results.)

 

Remember these two golden rules:

  • Use English for tags. (e.g., [whispers] works significantly better than any non-English equivalent.)

  • Tags are not spoken aloud. They are performance directions, so the AI knows not to narrate the word "whispers."
    (This is a v3-specific feature.)

 

① Emotional Tone Tags (Best at the start of a sentence)

Tag

Effect

Best Used For

[excited]

Upbeat, enthusiastic

Announcements, product launches

[sad]

Subdued, mournful

Emotional storytelling

[angry]

Furious, intense

Character dialogue, drama

[whispers]

Soft, hushed

Secrets, ASMR, suspense

[sarcastic]

Snarky, witty

Comedy, character personality

[curious]

Inquisitive tone

Questioning narration

[nervous]

Anxious, shaky

High-stakes scenes

[calm]

Stable, soothing

Meditation, tutorials

 

Example Input (v3):

[excited] Everyone, the new product is finally here!
[whispers] And here’s a secret: we’ve got a special discount just for today.

👉 The first sentence sounds bright and energetic; the second drops into an intimate whisper. Being able to shift tones mid-script is exactly what makes v3 so powerful.

 

② Non-Verbal Effect Tags (Best mid-sentence)

These go beyond changing "tone"—they generate actual sounds like laughter, sighs, or clearing your throat.

Tag

Sound

Best Position

[laughs]

Laughter

Middle/End

[chuckles]

Light, soft laugh

Middle/End

[sighs]

Heavy breath

Start/Middle

[gasps]

Sharp intake of breath

Start

[clears throat]

Throat clearing

Start

[exhales]

Deep exhale

Middle/End

 

Example Input (v3):

Oh, I totally forgot about that. [sighs] I guess I’ll have to start over.
So, here’s what I did yesterday… [laughs] it was such a silly mistake.

 

📌 Lab Tip: Don't over-tag! ⚡
You can combine tags, but stacking too many in one sentence often leads to instability (weird speed, artifacts) or the AI ignoring them entirely. We recommend sticking to one tag per sentence as a rule of thumb. Less is more!

 


🔤 2. Pronunciation: When AI gets names or numbers wrong

Even with perfect tone, if your AI reads "SaaS" as "Sass" instead of "S-A-A-S," your content is ruined. 😭 Pronunciation requires a different strategy than emotional tone.

 

Here are three ways to fix it:

① Phonetic Spelling (The Quick Fix)

Simply spell out the word the way it should sound in your script.

  • ChatGPTChat G P T

  • APIA P I

  • $22Twenty-two dollars (If numbers are read inconsistently)

This is perfect for one-off scripts.

 

② Use Hyphens for Separation

If two words bleed together (e.g., "fast-track"), use a hyphen to force a clean break.

fast-track / Chat-GPT / A-P-I

Hyphens act as both a pronunciation spacer and a subtle, short breath.

 

③ Pronunciation Dictionary (The Pro Move)

If you have specific brand names or industry terms you use constantly, don't waste time re-spelling them every time. Add them to your Pronunciation Dictionary once, and ElevenLabs will automatically handle them for every project.

 

🚨 Common Pitfall (Save your credits!)
In the dictionary, the Alias should be the phonetic sound. For the clearest English pronunciation, use the phonetic English spelling. ✅ e.g., SaaS → S A A S

※ Important: Do not use hyphens in the Alias field. Just use the phonetic spelling (e.g., "S A A S"). Reserve hyphens for pacing within your main script text.

Check out our Complete Pronunciation Guide for a step-by-step walkthrough on building your dictionary.

 

Note: v3 is much better at handling patterns like "$22" → "twenty-two dollars" than v2. However, it can sometimes oscillate between American and British pronunciations for the same word. If you need 100% consistency, the dictionary is your best friend.

 


⏸️ 3. Pacing & Breathing: The Magic of Punctuation

Punctuation is your most powerful tool for controlling flow. The AI interprets pauses, stops, and transitions based on these markers.

 

① Understanding Pause Lengths

Punctuation

Effect

Pause Length

Comma ,

Short, smooth break

Very short

Period .

Sentence end, tone drop

Medium

Line Break (Enter)

Paragraph break, reset

Long

Question Mark ?

Upward inflection

Medium

Exclamation !

Emphasis

Medium

Ellipsis …

Lingering, hesitation

Long/Soft

Hyphen -

Micro-break

Very short

 

② The Hyphen (-) — Our Secret Weapon

When a comma feels too long, but you don't want words to run together, the hyphen is your perfect bridge.

  • Fixing Blending: fast-track (prevents words from merging)

  • Micro-Breaths: Well, let’s begin- (creates a natural pause before a shift)

  • Building Suspense: It is-quite-unbelievable.

 

📌 v3 Note: Line breaks are a double-edged sword.
v3 re-samples the voice profile slightly at every paragraph break. If you need long-form consistency, keep your paragraphs long. If you want distinct mood changes, use line breaks to your advantage.

 

③ Precise Pauses — They differ by model! ⚡

When you need an exact 1-second pause, methods differ based on your engine:

Situation

Method

Notes

v3 Model

[pause] · [short pause] · [long pause]

v3 exclusive tags

v2 · Turbo · Flash

<break time="1.5s" />

SSML; not supported in v3

Studio Editor

Use the 'Insert Pause' button

No manual typing required

 

🚨 Common Mistake: Using <break> in v3.
v3 does not support SSML tags and will simply ignore them. Use [pause] for v3. Conversely, if you use [pause] in v2, it won't work—use the <break> tag instead.

 


📋 4. Practice Templates (Copy & Paste)

 

① YouTube Informational Narration

[curious] Did you know about this?
What I’m about to show you is-surprisingly simple, yet effective.
[excited] Let’s get started!

Tip: Hook the audience with [curious], and build tension with [excited]. Use hyphens for natural micro-pauses.

 

② Character Dialogue

[whispers] You mustn't tell anyone…
[nervous] The truth is, I was there that night.
[sighs] I suppose it’s too late for regrets.

Tip: Short sentences paired with emotional tags are the v3 "sweet spot." Perfect for gaming dubs or animation.

 

③ Product Ads (15s)

[excited] One day only! The price drops now.
Don’t wait-or you might miss out.
[calm] Make a choice that’s smarter than the rest.

Tip: High energy → Transition → Calm resolution.

 

④ Calm Guide / Instructions

Hello, customer.
I will now guide you through the setup, step-by-step.
First, press and hold the power button for three seconds.

Tip: No tags are needed here—clean punctuation is key. Use v2 for long, stable instructional content.

 

✅ Final Checklist

  • Using emotion/effect tags? → Are you using v3?

  • Used English for tags and kept to 1 tag per sentence?

  • Fixed pronunciation issues with phonetic spelling or hyphens?

  • Repeating brands? → Registered in the Pronunciation Dictionary?

  • Audio inconsistent? → Reduced line breaks or switched to v2?

  • Previewed the audio? (Always listen, don't just read!)

 


❓ FAQ

 

Q. I used [excited] but it just read out the word "excited." Why?
You are likely using v2. Audio tags work only on v3. Switch your model and try again.

 

Q. Can I use other languages for tags?
Stick to English (e.g., [whispers]). The v3 model is trained on these specific English tags to ensure the best performance.

 

Q. How do I fix numbers being read incorrectly?
Spell them out (e.g., "Twenty-two"). For recurring technical terms, use the Pronunciation Dictionary.

 

Q. How do I keep the voice consistent for long scripts?
v3 can have subtle tonal shifts between paragraphs. For absolute consistency, minimize line breaks or switch to v2.

 


🎁 Closing Thoughts

ElevenLabs is all about "how you train the engine."
The same text can be a dry reading or a high-end performance depending on your prompt. Take these templates, test them out, and breathe life into your projects. You’ll be surprised at how much difference these small tweaks make!

 

Stay tuned for more tips in our next post.
This has been Sonetho ⚡