I don’t want to subtitle every word. – Where poetry mutates into cinema

The other day I found myself staring at yet another YouTube Short filled with giant animated captions.

Every word bounced.

Every sentence flashed.

Every thought was highlighted, underlined, colorized, and practically escorted to my eyeballs by a security detail.

And apparently this is considered good.

I should probably love it.

I don’t.

Maybe I’m getting old.

Or maybe I’m still trying to make little movies instead of little attention traps.

2/8

The moment I realized what was bothering me

I spend a lot of time creating AI videos.

Images. Narration. Music. Motion.

Sometimes it feels less like editing and more like conducting an orchestra where half the musicians are robots and the other half are hallucinating.

When I first started making vertical videos, everyone told me the same thing:

“Add content subtitles.”

Not subtitles.

Content subtitles.

Apparently there’s a difference.

Traditional subtitles help people understand what is being said.

Content subtitles are designed to make sure people don’t scroll away before you’ve said it.

That’s not a criticism.

It’s simply a different goal.

The problem was that every time I added them, something felt wrong.

The videos became louder even when there was no sound.

3/8

The experiment

So I started paying attention.

I watched dozens of YouTube Shorts and Instagram Reels.

I noticed a pattern.

Most creators were putting the entire script on screen.

Not selected ideas.

Not themes.

Not emotional anchors.

Everything.

If the narrator said:

“I walked into the room and immediately knew something was wrong.”

Then the screen displayed:

I WALKED INTO THE ROOM

AND IMMEDIATELY KNEW

SOMETHING WAS WRONG

The words were accurate.

The timing was perfect.

And yet somehow the experience felt smaller.

Lab note:

When viewers read every word, they stop discovering the story. They begin consuming information instead.

4/8

What I think is happening

Movies don’t normally work this way.

Books don’t work this way.

Poetry definitely doesn’t work this way.

In traditional storytelling, different parts of the experience carry different responsibilities.

The images do one job.

The music does another.

The voice does another.

The audience participates by connecting the pieces.

Content subtitles change the equation.

Suddenly the words become the dominant layer.

The viewer doesn’t listen.

The viewer reads.

The imagery becomes support material.

The soundtrack becomes support material.

Everything starts orbiting the text.

For educational content, that may be exactly the right choice.

For emotional storytelling, I’m not so sure.

5/8

The compromise I accidentally discovered.

Instead of transcribing everything, I started selecting moments.

Only the phrases that mattered.

Only the ideas I wanted people to carry home.

For example:

Narration:

“I spent years searching for people who understood me. Most of the time I found noise instead.”

On-screen text:

SEARCHING FOR UNDERSTANDING

Later:

MOSTLY I FOUND NOISE

The difference was immediate.

The narration remained narration.

The visuals remained visuals.

The text became part of the composition rather than a running transcript.

Lab note:

A good text card behaves more like a musical refrain than a subtitle.

6/8

My current workflow

Today I think about on-screen text differently.

I ask one question:

What are the three to seven ideas that define this piece?

Those become the text.

Everything else belongs to the voice.

Sometimes the text appears as:

• A chapter title
• A thematic statement
• A lyric
• A memorable quote
• A poetic fragment

Rarely does it appear as a complete transcript.

The result feels less like social media content and more like a short film.

At least to me.

And since I’m the one spending hours staring at the timeline, my opinion gets one vote.

7/8

Tools & Creative Stack

For most projects I currently use:

• Midjourney
• Grok Imagine
• ChatGPT
• ElevenLabs
• LALAL.AI
• Kdenlive
• Audacity

Cost Breakdown

This particular discovery cost exactly zero dollars.

It merely required paying attention.

Which, unfortunately, is often the most expensive part of creativity.

The software wasn’t the solution.

Observation was.

The AI tools helped me generate images and narration.

But they didn’t tell me what kind of storyteller I wanted to be.

That part remains stubbornly human.

8/8

The real lesson

I finally realized that I don’t dislike subtitles.

I dislike surrendering every square inch of mystery.

A story needs room to breathe.

A viewer needs room to participate.

Sometimes the most powerful thing on the screen is not another word.

It’s the silence that follows it.

TL;DR

If you’re making educational content, subtitle everything.

If you’re making emotional content, consider subtitling only what matters.

Your audience can hear the rest.

As the video alchemist directing an increasingly strange orchestra of AI tools, I’ve learned that not every innovation deserves full volume.

Some ideas work best as whispers.

And sometimes the difference between content and cinema is simply knowing when to stop talking.

Steve Teare
video alchemist

TerminallyBored.Monster
Palouse, Washington USA