There’s a strange moment in this project where I stopped thinking in terms of inspiration entirely.
No “idea struck me.”
No character appeared.
No story unfolded in the traditional sense.
There was just a file on my desktop.
A summary of a transcript about a lecture called Why Empaths Disappear.
That’s it.
Not even the full transcript.
Just the compressed idea of it—someone else’s (Grok AI) interpretation of an interpretation.
It had been sitting there for over a month.
Not forgotten.
Just waiting for a system that could actually use it.
2/10
The video that came out of it is called:
I Didn’t Leave. I Stopped Arriving.
It’s a conversation between a man and a woman.
But that description is already misleading.
Because the dialogue wasn’t the starting point.
It was the output.
What I actually started with wasn’t a story at all.
It was a concept fragment—already reduced, already shaped, already emotionally filtered.
And that mattered.
Because it meant I didn’t need to invent meaning.
I only needed to see if meaning could survive execution.
3/10
The real reason I started this project wasn’t narrative.
It was technical.
I wanted to test a new workflow for producing lip-sync video:
- faster
- cheaper
- more iterative than WAN AI
- less “precious” per clip
That was the spark.
Not the story.
The story came later, as something I needed in order to test the system properly.
Lab note:
Most creative systems are story-led. This one wasn’t. It was constraint-led.
4/10
The system itself is simple, but strict:
- 6-second video units for all dialogue
- Extreme face dominance for lip sync (70%+ frame fill)
- Grok Imagine for facial performance generation
- ElevenLabs for voice consistency and replacement
- Midjourney for character casting and emotional framing
- B-roll and silence as structural pacing tools
Once those constraints are in place, something interesting happens:
You stop writing scenes.
You start assembling time-based emotional fragments.
5/10
The unexpected breakthrough wasn’t creative—it was economic.
Traditional WAN AI lip-sync costs roughly:
- $0.50 per 5-second clip
This hybrid approach (Grok + ElevenLabs) produces usable lip-sync units for roughly:
- $0.13 per 6-second clip
Same function. Different system.
Lab note:
When performance becomes cheap enough, it stops being special. It becomes selectable.
That changes how you think about emotion entirely.
6/10
Once cost dropped, iteration opened up.
And once iteration opened up, casting changed.
Midjourney stopped being “image generation” and became:
emotional auditioning
I wasn’t making characters.
I was selecting versions of emotional states:
- controlled withdrawal
- delayed recognition
- restrained presence
- subtle disconnection
Same faces.
Different emotional possibilities.
7/10
The script itself formed under constraint, not inspiration.
Dialogue became short because it had to be:
“You didn’t lose me.”
“I stopped arriving before I left.”
There’s no expansion beyond what can survive a 6-second container.
And surprisingly, that limitation removed noise instead of meaning.
Lab note:
Constraint doesn’t reduce expression. It removes everything that isn’t stable under pressure.
8/10
The b-roll ended up doing more emotional than I expected.
Empty rooms.
Half-used chairs.
Phones placed face down.
Hands hovering, then retreating.
Nothing dramatic.
Just evidence that something already changed before anyone said it out loud.
That became the emotional backbone of the piece.
Silence wasn’t absence.
It was delayed understanding.
9/10
What surprised me most is how coherent the end results feel.
Because on paper, it shouldn’t.
There was no full plan. No storyboard. No mapped narrative structure.
Just:
- a concept fragment
- a constraint system
- a set of tools
- and instinct operating inside a very narrow space
But that narrow space turned out to be enough.
Not to control the outcome.
But to keep it from falling apart.
And what emerged doesn’t feel engineered in the final form.
It feels fluid.
Human, even.
10/10
Looking back, I don’t think I built a video.
I think I built a small system that can take a compressed idea and hold it long enough to become visible.
Not by expanding it.
But by forcing it through constraints until only what can survive remains.
The story was never the starting point.
It was the thing that proved the system worked.
And now I’m left with something slightly strange:
A pipeline that doesn’t just generate content.
It reveals which ideas were strong enough to become real.
The actual cost of this video: $3.75. Think about that.
Steve Teare
video alchemist
TerminallyBored.Monster
Palouse, Washington USA
