The other day I found myself staring at yet another YouTube Short filled with giant animated captions.
Every word bounced.
Every sentence flashed.
Every thought was highlighted, underlined, colorized, and practically escorted to my eyeballs by a security detail.
And apparently this is considered good.
I should probably love it.
I don’t.
Maybe I’m getting old.
Or maybe I’m still trying to make little movies instead of little attention traps.
2/8
The moment I realized what was bothering me
I spend a lot of time creating AI videos.
Images. Narration. Music. Motion.
Sometimes it feels less like editing and more like conducting an orchestra where half the musicians are robots and the other half are hallucinating.
When I first started making vertical videos, everyone told me the same thing:
“Add content subtitles.”
Not subtitles.
Content subtitles.
Apparently there’s a difference.
Traditional subtitles help people understand what is being said.
Content subtitles are designed to make sure people don’t scroll away before you’ve said it.
That’s not a criticism.
It’s simply a different goal.
The problem was that every time I added them, something felt wrong.
The videos became louder even when there was no sound.
3/8
The experiment
So I started paying attention.
I watched dozens of YouTube Shorts and Instagram Reels.
I noticed a pattern.
Most creators were putting the entire script on screen.
Not selected ideas.
Not themes.
Not emotional anchors.
Everything.
If the narrator said:
“I walked into the room and immediately knew something was wrong.”
Then the screen displayed:
I WALKED INTO THE ROOM
AND IMMEDIATELY KNEW
SOMETHING WAS WRONG
The words were accurate.
The timing was perfect.
And yet somehow the experience felt smaller.
Lab note:
When viewers read every word, they stop discovering the story. They begin consuming information instead.
4/8
What I think is happening
Movies don’t normally work this way.
Books don’t work this way.
Poetry definitely doesn’t work this way.
In traditional storytelling, different parts of the experience carry different responsibilities.
The images do one job.
The music does another.
The voice does another.
The audience participates by connecting the pieces.
Content subtitles change the equation.
Suddenly the words become the dominant layer.
The viewer doesn’t listen.
The viewer reads.
The imagery becomes support material.
The soundtrack becomes support material.
Everything starts orbiting the text.
For educational content, that may be exactly the right choice.
For emotional storytelling, I’m not so sure.
5/8
The compromise I accidentally discovered.
Instead of transcribing everything, I started selecting moments.
Only the phrases that mattered.
Only the ideas I wanted people to carry home.
For example:
Narration:
“I spent years searching for people who understood me. Most of the time I found noise instead.”
On-screen text:
SEARCHING FOR UNDERSTANDING
Later:
MOSTLY I FOUND NOISE
The difference was immediate.
The narration remained narration.
The visuals remained visuals.
The text became part of the composition rather than a running transcript.
Lab note:
A good text card behaves more like a musical refrain than a subtitle.
6/8
My current workflow
Today I think about on-screen text differently.
I ask one question:
What are the three to seven ideas that define this piece?
Those become the text.
Everything else belongs to the voice.
Sometimes the text appears as:
• A chapter title
• A thematic statement
• A lyric
• A memorable quote
• A poetic fragment
Rarely does it appear as a complete transcript.
The result feels less like social media content and more like a short film.
At least to me.
And since I’m the one spending hours staring at the timeline, my opinion gets one vote.
7/8
Tools & Creative Stack
For most projects I currently use:
• Midjourney
• Grok Imagine
• ChatGPT
• ElevenLabs
• LALAL.AI
• Kdenlive
• Audacity
Cost Breakdown
This particular discovery cost exactly zero dollars.
It merely required paying attention.
Which, unfortunately, is often the most expensive part of creativity.
The software wasn’t the solution.
Observation was.
The AI tools helped me generate images and narration.
But they didn’t tell me what kind of storyteller I wanted to be.
That part remains stubbornly human.
8/8
The real lesson
I finally realized that I don’t dislike subtitles.
I dislike surrendering every square inch of mystery.
A story needs room to breathe.
A viewer needs room to participate.
Sometimes the most powerful thing on the screen is not another word.
It’s the silence that follows it.
TL;DR
If you’re making educational content, subtitle everything.
If you’re making emotional content, consider subtitling only what matters.
Your audience can hear the rest.
As the video alchemist directing an increasingly strange orchestra of AI tools, I’ve learned that not every innovation deserves full volume.
Some ideas work best as whispers.
And sometimes the difference between content and cinema is simply knowing when to stop talking.
Steve Teare
video alchemist
TerminallyBored.Monster
Palouse, Washington USA
