1/9
I was staring at a MidJourney still… thinking: “Sure, this looks nice… but it doesn’t breathe.”
It started as a seductive, almost naïve idea: take four vertical 9:16 images, feed them into AI, and—poof—20 seconds of video. A blink-and-you-miss-it experiment. I imagined effortless motion, seamless continuity, a character alive without ever stepping onto a set.
The reality? AI taught me humility, the subtle tyranny of micro-gestures, and why the uncanny valley feels like a tiny electric shock to your chest.
2/9
Step 1: Building the Reference Stills
I began in MidJourney. Four images of a single woman—my reference character—had to hold across shots: same face, wardrobe, period details, gaze, and environment. Prompts were stripped down, no ambiguity, no “or” statements, no punctuation, no embedded negatives. Every word mattered.




white woman early 30s, long straight brown hair, neutral makeup, Victorian-era modest gray blouse with high collar and long sleeves, dark brown bodice, long dark skirt, black leather ankle boots, standing near window, small plain room, white plaster walls, thin white curtain, natural daylight, full figure from floor to head visible in vertical frame, subject positioned left side of frame, wooden chair with folded blanket on right side, head turned slightly toward chair, gaze toward chair, arms resting naturally at sides, curtain edge partially obscuring frame, imperfect composition –chaos 10 –stylize 250 –no looking at camera, smiling, posing, glamour, symmetry
–prompt written by ChatGPT
Lab note: Even the tiniest difference in wording could create a character who looked like her but wasn’t her. The AI isn’t psychic. It doesn’t know you want continuity. It only knows descriptors.
The first outputs were… educational. Shot one: legs cropped oddly, sometimes staring at the camera, sometimes peering out a window. Shot three: folded hands mutated into what I can only call “mangled sausages.” Shot four held together better, but a half-degree shift in gaze could ruin continuity.
3/9
Step 2: Thinking Like AI, Not Like a Filmmaker
This is the mental pivot: filmmakers want to imply consistency. AI needs literally spelled out instructions.
- “Intimate” gets censored.
- “Same wardrobe” is meaningless.
- “Full-body with feet visible” is a gamble.
Everything you assume a human would see must be spelled out in painstaking literal terms. Punctuation, embedded negatives, vague adjectives—all sabotage the prompt.
Lab note: It’s humbling. You feel like a drill sergeant of pixels, barking instructions, hoping the AI gets the nuance of your vision.
After multiple iterations, we had four images consistent enough to serve as visual anchors. Yet, they were just stills.
4/9
Step 3: Enter Motion — SeeDance
Animating stills is a completely different beast. I turned to SeeDance. Feed it a still, and it generates subtle head tilts, breathing, micro-shifts in posture. It preserves identity, wardrobe, environment—things MidJourney can’t.
But. Micro-expression? Emotional nuance? Forget it. “Wistful gaze” is polite suggestion, not command. The AI nods, gives you a robot staring into the void.
Here’s where you layer beyond visuals: voiceover and music. They carry the interior life AI motion cannot deliver.
5/9
Step 4: Voiceover — The Secret Weapon
We had 20 seconds of video: four shots, each slightly animated. The visuals anchored identity and environment, but emotion needed voice. Enter ElevenLabs.
Raw narration doesn’t cut it. The script had to be poetic, pensive, interior. Show don’t tell became the mantra.
Example voiceover layering:
[wistful] There’s a shadow in me… [sighs] a quiet ache that twists between memory and regret...
[reflective] The light falters, and a half-forgotten thought grips me… [long pause] lingering where I thought it had passed...
[melancholy] My pulse trembles with the weight of what is unsaid… [exhales sharply] a longing that presses into the hollow of my chest...
[soft] I reach toward something I cannot name, uncertain if it waits, or if I am already too late to hold it...
Emotion tags and breath cues are everything. Suddenly, slightly animated stills felt alive.
Lab note: AI can mimic micro-motion. Voiceover gives it soul.
6/9
Step 5: Music — Emotional Glue
The visuals and voice set the stage. Music completes it.
- Drone in E minor.
- Slow, sparse, with subtle swells.
- Accent the cadence of introspection.
Music isn’t background fluff. It guides perception, shapes emotional peaks, highlights breath pauses, and deepens longing. The frames are the canvas; the score is the brushstroke of feeling.
7/9
Step 6: Lessons Learned
1️⃣ AI Isn’t Psychic — Be Concrete
Spell out every wardrobe, gaze, and prop detail. AI takes you literally, not intuitively.
2️⃣ Consistency is Hard
Even tiny posture or gaze shifts break continuity. Plan for minor variations in editing.
3️⃣ Motion is Best Layered Separately
SeeDance works for breathing, posture, slight tilts, camera motion. Emotional nuance must come from other layers.
4️⃣ Voiceover is the Secret Weapon
Poetic, interior scripts + emotion tags + breath cues = believable internal life.
5️⃣ Music is Emotional Glue
Drone swells, harmonics, textures with voiceover deepen impact.
6️⃣ Small Details Matter
- Off-center composition.
- Spatial anchors (desks, curtains).
- Camera movement variation: dolly, pan, push-in.
- Micro-gestures: hand movement, tiny head shifts.
7️⃣ Embrace Limitations
AI can’t replace human intuition. Stillness invites reflection; motion + sound carry the story.
8️⃣ Experiment, Iterate, Test
Prompt iterations, gaze correction, and micro-gesture adjustments teach more than theory ever could.
8/9
Step 7: Final Takeaways
This 20-second experiment—four stills, subtle AI motion, poetic voiceover, drone score—is deceptively simple.
AI is a tool, not a storyteller. It can give you faces, hands, bodies that move. But feeling? That’s human work. The narrative, interior life, and emotional resonance live in your direction, layering, pacing, and restraint.
For filmmakers at TerminallyBored.Monster: stop chasing perfect AI output. Embrace quirks. Let voice and music breathe life. Think of AI as your emotional amplifier, not a replacement for intuition.
When the stills move, the voice speaks, and the drone hums… the robot stops feeling uncanny. The story begins.
9/9
Tools & Creative Stack
- MidJourney – Stills generation, reference images
- SeeDance – Still-to-video subtle motion
- ElevenLabs – Interior, poetic voiceover (Francesca Segretto)
- Kdenlive – Editing & layering
- Producer.ai – Drone Music in E minor – Emotional cadence
- ChatGPT – Image and music prompt generation
TL;DR: AI gives you motion without soul. You give it life. Layer carefully, look for micro-details, and embrace the robot’s quirks.
Reality Check: Don’t obsess over details. Embrace mediocrity. Chasing impossible perfectionism only causes delays and budget overruns. Embrace AI mediocrity. Tell a better story.
— Steve Teare
video alchemist
TerminallyBored.Monster
Palouse, Washington
