
"Generate a podcast in my own voice?"
ElevenLabs' highly anticipated GenFM (Podcast Creator) feature just received a massive upgrade, throwing down the gauntlet to Google's NotebookLM—the tool that originally sparked the AI podcast craze.
Can ElevenLabs truly dethrone Google in the conversational audio space?
Here at Sonetho, we put it to a rigorous, unbiased test.
🎯 Even if it's still a bit rough, 'your own voice' is something Google can't catch up to
GenFM still falls a step short of NotebookLM when it comes to the naturalness of conversation, but 'making a podcast in your own voice (PVC)' is an ElevenLabs-only weapon that Google simply can't pull off. If you approach it as a 'draft tool' rather than a perfect finished product, it's plenty usable even now.
🎵 Try making a podcast draft in your own voice →
PVC can be tried even on the free plan · Still the best for sketching out a script framework
Hey everyone! Welcome back to Sonetho. ⚡
Right now, the absolute hottest trend in AI is the "Audio Overview."
You feed it raw text, and two AI hosts banter back and forth, summarizing complex topics on the fly.
Ever since Google's NotebookLM blew everyone away with scarily realistic interactions (like hosts talking over each other or dropping casual "wait, let me finish" interjections), we've been waiting to see how ElevenLabs—the undisputed king of generative voice—would strike back.
Today, I plugged one of our deep-dive articles into their updated generator to see if it holds up. Here is my honest, unfiltered review.
1. High Expectations: Hosting a Show in Your Own Voice
Right out of the gate, ElevenLabs absolutely crushes Google on customization. While NotebookLM locks you into their default, unchangeable voice models, ElevenLabs lets you cast your own Professional Voice Clone (PVC).
[▼ Navigating to Projects → Create a Podcast]

I pasted the URL of our previous deep dive comparing the top three voice synthesis engines and set the hosts to "Sarah" and "Dan"—both powered by our custom-trained, studio-grade PVC models.
[▼ Podcast Lineup: Custom PVC Hosts (Sarah and Dan)]

2. Advanced Customization: Stepping into the Director's Chair
Under the Advanced Configuration panel, you get to step into the director's chair. I gave the AI some specific prompt guidelines to shape the narrative flow.

"Compare the three platforms, but make sure to emphasize that ElevenLabs reigns supreme in sheer naturalness. Structure it like an engaging, highly professional tech review."
Up to this point, everything was flawless. I eagerly clicked that Generate button, ready to hear the magic.
3. The Big Reveal: "Hmm... Well, This is Interesting..."
Once the generation finished, I took a close look at the AI-generated script and stage directions.
[▼ The AI-generated script with stage directions]

🎧 Listen to the Actual Generation (GenFM)
(Click the play button to hear the generated podcast)
The Pros:
- It automatically inserts highly intuitive stage directions like *[sigh]* or *[laughs]* to break up the text.
- The content synthesis is spot-on. It didn't just read the post; it intelligently extracted the core arguments and reframed them for a casual dialogue format.
- Leveraging the state-of-the-art v3 model (which is fully mature and active here in 2026), the emotional range, breathiness, and dynamic pitch variation sound astonishingly lifelike.
- Masterful Linguistic Nuances: The engine handles tricky pronunciation effortlessly. Acronyms (like "CEO" or "NASA"), contextual homographs (e.g., distinguishing "read" [present] vs. "read" [past], "wind", "lead"), currency symbols, and French loanwords like déjà vu are delivered with flawless native cadence.
The Cons (and they're quite noticeable):
1. The dialogue cadence feels slightly formulaic.
While the voices sound incredibly human, the back-and-forth still relies heavily on polite transitions ("That makes a lot of sense," "Yes, absolutely."). It still lacks the chaotic, spontaneous interruptions and overlapping laughter that made Google's NotebookLM outputs go viral.
2. The "Eager Host" syndrome (A pacing and latency issue)
In real conversations, humans use micro-pauses, intake of breath, or vocalized fillers ("uh," "mhm") before responding. Currently, GenFM has a near-zero latency handoff; the exact millisecond Host A finishes speaking, Host B cuts in. This instant turn-taking makes it sound slightly rushed.
This is likely because the early v3 API hasn't fully optimized multi-speaker conversational pacing yet. In longer episodes, this lack of breathing room can feel a bit unnatural. 😅
4. Verdict: Does Google Still Hold the Crown?
Let's be real: when it comes to chaotic, hyper-realistic banter, Google NotebookLM still holds the edge. ElevenLabs' GenFM is spectacular for structured, info-heavy shows (like tech news, corporate summaries, or educational content). But if you want that raw, unscripted "two friends hanging out in a studio" vibe, ElevenLabs still feels a bit too polished.
💡 How can you make the most of it right now?
- Use it to lay the groundwork: Let the AI draft the core structure and synthesize your source material.
- Inject manual conversational cues: Once the script draft is ready, manually add quick, reactive interjections (like "Wait, really?", "Uh-huh," or "That's wild!") in the script editor to break up the clinical handoffs.
Wrapping Up: All Eyes on the Official v3 Release!
Ultimately, conversational AI lives and dies by pacing. Once ElevenLabs fully rolls out its production-grade v3 engine alongside the lightning-fast Flash v2.5 model, the system should be able to dynamically coordinate turn-taking without needing manual formatting tags. When that happens, the pacing issue will be a thing of the past.
"Dear ElevenLabs team: please give us the fully optimized v3 and Multilingual v2 conversational upgrades soon. We are ready!" 😵💫
Still, want to try generating a podcast using your very own cloned voice? Dive in and experience it today.
In our next post, we’ll show you an incredible workaround to get the best of both worlds: "The Ultimate Hybrid: Google's Banter Script + ElevenLabs' Studio-Quality PVC." Stay tuned!
📂 [Appendix] Download the Raw Audio Output
Listen to the raw, slightly eager AI-generated conversation for yourself.