I killed the song… finally the film breathed. – Where poetry mutates into cinema

1/9
I knew something was wrong the moment the music started.

It sounded… heavy.
Like it was dragging the images instead of lifting them.

Julie’s poem deserved better than a dirge.

So I did something I usually resist.
I scrapped the song entirely.

2/9

The Spark

The poem was already doing the emotional work.

It had rhythm.
It had breath.
It had that quiet, spiraling feeling of being stuck inside something long after it’s over.

And I buried it under music lyrics.

Classic mistake.

So I stripped it all back and asked a simpler question:

What if a voice carries everything?

No melody.
No structure to hide behind.
Just words… and space.

3/9

Step 1: The Voiceover Problem

I moved the poem into Producer.ai and started generating narration — instead of lyrics.

It should have been easy.

It wasn’t.

Pronunciation broke everything.

“Unearthed” turned into something alien
Cadence drifted
Emotional emphasis landed in the wrong places

So I iterated.

Again.
And again.
And again.

At one point I started spelling words phonetically just to force the delivery.

Lab note: AI voice isn’t about realism. It’s about control.
You don’t get natural speech — you sculpt it.

Eventually, something clicked.

The voice stopped sounding like a machine…
and started sounding like a person thinking out loud.

4/9

Step 2: The Images (144 Attempts at Coherence)

While the audio was stabilizing, I moved into Midjourney.

I leaned into a single idea:

Alice in Wonderland, but internal.

Not whimsy.
Not fantasy.

Disorientation.

Victorian hallway.
Endless corridors.
Doors that don’t quite behave.

A woman moving through it — or sometimes not there at all.

I generated 144 images.

Not because I needed that many.
Because I didn’t trust any single one to be right.
And… I run in relaxed mode — instead of fast mode. That means “all-you-can-eat” images. No burning up credits.

Lab note: Volume is not excess. It’s exploration.

Midjourney doesn’t give you the image.
It gives you possibilities.

You find the film later.

5/9

Step 3: Continuity Is a Lie (Until You Force It)

This part fought me.

Hard.

Hair up. Hair down.
Frontal stare. Dead center.
The “why is she looking at me like that?” problem

Even with reference images, the system drifts.

So I had to lock things manually:

Same dress. Same wording. Every prompt.
Same hair description. Every prompt.
Same environment block. Every prompt.

And even then…

It still tried to improvise.

Lab note: Midjourney rewards consistency, not creativity — until you earn the right to break it.

6/9

Step 4: The Frame Problem (Vertical Reality)

Everything was 9:16 vertical.

Which means:

If she’s in the frame… she’s in the center.

And if she’s in the center…

She’s staring straight into the camera.

Every. Single. Time.

So I changed strategy:

Some shots: she’s fully present
Some shots: partial — hands, reflection, silhouette
Some shots: she’s gone completely

The hallway takes over.

The space breathes.

And when she comes back… it matters again.

7/9

Tools & Creative Stack

Producer.ai — voiceover generation and music (and pronunciation wrestling)
Midjourney v7 — image generation (144 frames total)
ChatGPT – image and music prompt generation
Kdenlive — final assembly and sequencing
WAN video – Lip sync (one 10-second clip)
My own stubbornness — critical component

8/9

The Real Discovery

The music was the problem.

Not because it was bad. (OK. Maybe it was like a nursery rhyme in a minor key? Not good.)

Rather because it was too much.

It told the viewer what to feel.
The voice invites them to feel it themselves.

Key takeaway:

Emotion doesn’t need amplification — it needs space
Voiceover can carry weight that lyrics sometimes smothers
Less structure = more honesty

Or, more bluntly:

I didn’t fix the piece until I removed the thing I thought made it “complete.”

9/9

Watch Video At:

This one surprised me.

Not because it worked.

But because it worked after I let go of control in the wrong places… and tightened it in the right ones.

The images finally stopped posing.
The voice finally stopped performing.

And somewhere in between…
the piece started breathing.

— Steve Teare
video alchemist