I fought the defaults and the AI stared right back. – Where poetry mutates into cinema

A quiet portrait, nothing dramatic. Just a woman lost in thought by a rainy window, no eye contact, no perfect symmetry, no exaggerated curves.

The first generation centered her dead-on, eyes locked to the lens like she was selling something. I sighed, added negative prompts, angle instructions, composition notes. Second try: still centered, still staring, cleavage dialed up even though I never mentioned it.

That moment crystallized everything I have been wrestling with for months. These tools do not just generate images. They carry the weight of every photo, every ad, every selfie and pinup the internet ever fed them. And they fight to repeat the most common patterns like a muscle memory we cannot quite override.

This is the story of learning why.

The defaults are not bugs. They are the data talking.

I started this session with a simple goal. Capture a feeling of interior distance. Solitude that breathes. I wanted the character slightly off-center, gaze averted, body language private. Something that felt like real life instead of a thumbnail.

Lab note: the friction is where the lesson lives. Every time I push against the model, it reveals what it truly learned.

Step 1: the neutral prompt trap.

I began with something clean. A thoughtful woman in soft window light, contemplative mood, natural pose. Within seconds the output showed perfect centering, direct eye contact, idealized proportions. It looked like a stock portrait from a photography site. Which makes sense, because that is exactly what dominates the training data.

Step 2: adding composition surgery.

I layered in specifics. Off-center composition, three-quarter view, eyes looking away toward the rain, modest clothing, realistic imperfection in the face. The model compromised but still pulled her closer to center and sneaked in a hint of that camera-ready stare. It was like the AI had a gravitational pull toward the most probable arrangement of pixels.

Step 3: the sexualization battle.

Even with explicit negative prompts—no cleavage, conservative attire, everyday proportions—the outputs kept drifting toward beautified, sexualized forms. Not blatant, but the subtle exaggeration was there. Collarbones too defined. Waist too cinched. Lighting too flattering. This is not the model being naughty. It is statistics. The internet simply contains far more images emphasizing female attractiveness in that particular way than neutral or restrained ones.

Step 4: testing the male counterpoint.

Out of curiosity I flipped the prompt to a man in the same setting. Neutral results. Professional distance. No equivalent pressure toward objectification. The asymmetry was stark and instructive.

Step 5: digging into why this happens.

I spent time tracing patterns across Grok Imagine, Midjourney, and a couple of local models. Same behaviors everywhere. Centering. Direct gaze. Beauty filters on women. These are not deliberate design choices coded by engineers. They emerge from training on billions of captioned images scraped from the public web.

Most online photography follows rules that sell: center the subject, catch the eyes, make it pop. Advertising, fashion, social media, and yes, adult content all reinforce certain body ideals for women at massive scale. The diffusion process averages toward what it saw most often. When the prompt is even slightly open-ended, the model defaults to the highest-probability path. That path is littered with stock tropes, cinematic framing, and cultural shorthand.

Lab note: this is why vague prompts feel cursed. The AI is not lazy. It is honest about what humanity uploaded.

Tools & creative stack

Grok Imagine for rapid iteration and honest friction.
Midjourney for comparison testing.
A notebook full of negative prompt lists that grew longer every week.
No video this time, just stills. The motion would have hidden the compositional tells I was studying.

The real insight hit when I stopped fighting and started listening to the defaults.

These patterns are mirrors. Distorted, amplified mirrors of collective visual culture. Centering and eye contact dominate because they work—they grab attention in a scroll. Over-sexualization of female characters flows from sheer volume of that content online, not from some secret directive. Stereotypes persist because they were statistically common in the training data.

The models do not invent these things. They remix what we gave them.

Happy accidents still happen when you push hard enough. One generation finally gave me the averted gaze and quiet tension I wanted, but only after I described the exact quality of light on her shoulder and the slight slump in her posture like I was directing a real actor. The extra words forced the model out of its comfort zone.

Key lessons that stuck:

Specificity beats vague rebellion.
Negative prompts help but cannot rewrite statistical reality.
Understanding the why makes the fight more playful than frustrating.

The best images often live in the tension between what the model wants to give you and what you insist on.

TL;DR: AI image tools default to centered stares and idealized women because that is what the internet taught them. The more you know the data patterns, the better you can dance with them instead of against them.

I keep coming back to this work because the friction feels alive. Every stubborn stare back at the camera reminds me I am not just typing prompts. I am negotiating with a reflection of all the images humanity has ever loved, sold, or lingered on. Sometimes I win the negotiation. Sometimes the AI wins. The dialogue itself is the creative act.

There is poetry in that push and pull. In learning to make the pixels breathe on my terms instead of the dataset’s.

Steve Teare
video alchemist

TerminallyBored.Monster
Palouse, Washington USA