Making AI films feel like cinema (and why identity breaks when you let it breathe).

There’s a moment you hit in AI video work where everything looks like it should be working… but isn’t.

The shots are beautiful.
The lighting is right.
The camera language feels almost expensive.

And yet the whole thing collapses into something slightly off—like a dream that keeps changing the actor’s face every time you blink.

That’s where this recent exploration landed me.

Not in “how do I make better prompts?”
But in something more uncomfortable:

I’ve been fighting the system in the wrong place.

The problem wasn’t the shots — it was the identity layer.

I started from a simple observation.

In 9:16 AI filmmaking, especially with tools like Midjourney-style generation, I kept running into a frustrating split:

If I used character reference tools (--oref), I got consistency… but boring framing
If I removed them, I got beautiful cinematic shots… but identity drift

So I tried to solve it like a filmmaker:

“Fix continuity in post. Add control later.”

But that wasn’t the real issue.

The real issue was this:

Identity tools don’t just “hold a face.”
They reshape the entire visual system around that face.

And not in a helpful way.

They collapse composition into safe, repetitive portrait logic:

too many headshots
too many centered faces
too little camera aggression
too little cinematic risk

Lab note: I originally thought this was a limitation of prompting. It’s not. It’s a structural bias in how identity conditioning steers composition.

The first breakthrough: identity is not a reference problem — it’s a prompt completeness problem.

At first I assumed the solution was abstraction:

“Don’t describe too much. Let the model imagine.”

That was wrong.

Abstraction didn’t create freedom. It created instability.

The model didn’t get more cinematic—it just started re-inventing the character from scratch every shot.

So instead of solving identity with less information, I tried the opposite:

I over-specified it.

Age. Hair. Skin tone. Wardrobe. Era. Physical presence. Emotional posture.

Not as decoration.

As constraint.

And something interesting happened:

When identity becomes fully defined in text, it stops interfering with composition.

It becomes stable.

Predictable.

Reusable.

Almost like casting a real actor instead of re-generating a new one every shot.

Lab shift: from reference-driven identity to structured character blocks.

This is where the workflow actually changed.

Instead of:

“use character reference”
or “same as before”
or abstract identity phrases

I started building a fixed identity block:

Age
Skin tone
Hair style
Hair color
Body type
Wardrobe baseline
Emotional presence

And then I stopped touching it.

Every prompt became:

Identity block + camera instruction + scene action

That separation mattered more than anything else.

Lab note: this is where AI filmmaking starts feeling less like prompting and more like production design.

The second discovery: cinematic freedom comes from removing identity tools, not identity itself.

This was the contradiction I didn’t expect.

The most cinematic shots came from:

removing --oref
removing identity shortcuts
removing reference anchoring

But not removing identity itself.

That distinction is everything.

Because what actually kills dynamism is not consistency—it’s how consistency is enforced.

Reference-based identity forces:

face-first framing
conservative composition
portrait gravity

Text-defined identity allows:

full-body shots
aggressive angles
environmental storytelling
spatial freedom

Same character. Different physics.

The production reality: expensive shots, selective repair.

Of course, this introduced a new problem.

Now I had freedom—but less guaranteed consistency.

So the workflow split into two passes:

cinematic generation pass
selective identity repair pass

Not everything gets fixed.

Only the shots that matter:

emotional close-ups
dialogue beats
narrative anchors

Everything else is left raw if it works.

Lab note: this is where cost actually gets controlled. Not in generation—but in deciding what is worth stabilizing.

The hidden system: shot tagging becomes the real editing tool.

Once the pipeline expanded, I had to stop thinking in clips and start thinking in categories.

Every shot now gets tagged:

role (setup, tension, reveal, payoff)
camera (OTS, close-up, wide, reaction)
identity status (OK, partial, broken)
fix status (raw, fixed, locked)

What this does is simple but powerful:

It stops the workflow from becoming emotional.

You’re no longer reacting to every clip.

You’re routing them.

Lab note: this is where AI video stops feeling like improvisation and starts feeling like orchestration.

The uncomfortable truth: smooth “Hollywood” AI is not about generation quality.

Watching the reference material I studied, something stood out.

The polish wasn’t coming from better AI.

It came from:

extreme shot density control
intentional pacing (fractions of seconds matter)
selective lip sync usage
ruthless editing rhythm
and very deliberate over-the-shoulder spatial logic

But the real hidden cost?

Time.

Credits.

Iteration cycles.

This is not a fast workflow.

It’s a controlled burn.

The real insight I’m taking from this.

It’s not that AI filmmaking needs better prompts.

It’s that it needs clearer separation of roles:

Identity is not cinematography.
Cinematography is not identity.
Editing is not generation.

When those collapse into one prompt, everything becomes average.

When they separate, the system starts to behave like a real production pipeline.

Imperfect. Expensive. Slow.

But cinematic.

Closing thought:

I used to think the goal was to make AI “understand the shot.”

Now I think the goal is simpler and harder:

To stop the system from confusing who is in the frame with how the frame is built.

Once you separate those two things, something interesting happens.

The images stop feeling generated.

They start feeling staged.

And that’s where cinema quietly begins.

Steve Teare
video alchemist

TerminallyBored.Monster
Palouse, Washington USA