GPT Image 2 Prompt Writing Guide: 7 Rules for 90% Hit Rate
2026/04/23

GPT Image 2 Prompt Writing Guide: 7 Rules for 90% Hit Rate

A practical GPT Image 2 prompt writing guide from 200+ generations. The 7 rules, structure, keywords, and anti-patterns for one-shot success.

If you've tried GPT Image 2 and felt like it ignores half your prompt, the issue is almost never the model — it's the way the prompt is written. After running 200+ generations and comparing a hit-rate matrix, the same 7 rules account for the difference between "first-try success" and "five retries until I gave up."

This is a practical GPT Image 2 prompt writing guide. Every rule below is something you can apply to your next prompt in 30 seconds.

Why most GPT Image 2 prompts fail

Three patterns cause about 80% of prompt failures:

  1. Treating GPT Image 2 like Stable Diffusion — stuffing the prompt with masterpiece, 8k, ultra detailed, high quality keyword soup. These tokens are noise to GPT Image 2.
  2. Writing unstructured run-on sentences — one long English/Chinese sentence with everything jumbled together. GPT Image 2 reads structure; structure reads back.
  3. Forgetting to quote text content — saying the headline says limited offer is way less reliable than saying the headline says "Limited Offer". The quotes change everything.

If you fix only those three, your hit rate doubles. Below are the 7 rules in detail.

Rule 1: Structure your prompt — subject, scene, style, text, camera

A reliable GPT Image 2 prompt has 5 ordered components:

ComponentWhat goes hereExample
SubjectThe main object or charactera white stainless steel water bottle
SceneBackground and environmenton a beige linen tablecloth, soft indoor light
StyleVisual mood and referenceeditorial product photography, premium feel
TextAll on-image text in quotestop-left red badge: "50% off"
CameraLens, angle, lighting45-degree side light, shallow depth of field

Stitch them together with commas. A complete prompt looks like:

A white stainless steel water bottle, on a beige linen tablecloth,
soft indoor light, editorial product photography, premium feel,
top-left red badge "50% off", bottom black bold text
"Daily Commute Companion", 45-degree side light, shallow depth of field.

This structure works because GPT Image 2 is a language model — it follows narrative order. Random order = random output.

Rule 2: Quote every piece of on-image text

This is the single highest-leverage rule. The difference between:

the headline says limited offerthe headline reads "Limited Offer"

Is a 30-40 percentage-point hit rate gap on text-rendering accuracy. Why? The quotes tell the model "this exact string is what you render," instead of "describe the concept of a limited offer."

Same applies to non-Latin text:

❌ 标题写限时五折 ✅ 标题写 "限时五折"

When you have multiple text elements:

Headline at top reads "2026 Spring Collection",
subhead reads "30% Off Sitewide",
bottom-left small text reads "Code: SPRING30",
right-side vertical text reads "Limited Time".

Each piece quoted, each location specified.

Rule 3: Specify location for every element

GPT Image 2 understands spatial language well — but only if you give it.

Vague: a logo and some text on the image Precise: a circular logo in the top-left corner, three lines of text in the bottom-right corner

Spatial vocabulary that works reliably:

  • top-left / top-right / top-center / bottom-left / bottom-right / bottom-center
  • centered / vertically centered / horizontally centered
  • foreground / midground / background
  • above the headline / below the subhead / next to the icon

When you have 3+ elements, every element gets a location. No exceptions.

Rule 4: Constrain the negative — say what you DON'T want

Diffusion models had explicit "negative prompt" fields. GPT Image 2 doesn't, but it understands plain-language constraints:

... no text on the bottle itself,
no shadows on the background,
no other objects in frame,
no watermark.

Anti-patterns are especially useful for:

  • Removing watermarks (no watermark, no logo overlay)
  • Cleaning busy backgrounds (solid plain background, no decorations)
  • Avoiding extra hands or fingers (hands clearly visible, anatomically correct)
  • Preventing over-decoration (minimalist, no extra ornaments)

About 1 in 5 retries can be eliminated by spending 10 seconds writing what you don't want.

Rule 5: Anchor the style with a reference, not adjectives

"Beautiful" "stunning" "amazing" tell the model nothing. Anchored references tell it everything.

Weak: a beautiful illustration of a girl Strong: a Studio Ghibli style illustration of a girl, soft watercolor textures, warm color palette

High-leverage style anchors:

CategoryAnchor examples
IllustrationStudio Ghibli, Pixar, Cartoon Network 2010s, Adventure Time, Genshin Impact
PhotographyWes Anderson, Annie Leibovitz, National Geographic, Vogue editorial, Kodak Portra 400
PaintingMonet impressionism, Van Gogh post-impressionism, Hopper realism, ukiyo-e
ModernY2K aesthetic, vaporwave, brutalist design, Memphis pattern, Bauhaus
CinematicWong Kar-wai, Christopher Nolan, A24 film palette, Blade Runner 2049

The model knows these references. Use them.

Rule 6: Lock the camera and lighting in real photography terms

For photo-realistic outputs, the difference between amateur and pro is camera vocabulary.

Beginner: a realistic photo of a coffee cup on a desk Pro:

A coffee cup on a wooden desk, shot on Sony A7R IV, 35mm f/2.8 lens,
shallow depth of field, soft natural window light from the left,
golden hour color temperature, slight film grain.

Camera terms that demonstrably improve realism:

  • Lens: 35mm, 50mm, 85mm portrait lens, wide-angle 24mm, macro 100mm
  • Aperture: f/1.4, f/2.8, shallow depth of field, deep focus
  • Body: Sony A7R IV, Canon EOS R5, Leica M11, Hasselblad medium format
  • Light: golden hour, blue hour, softbox studio lighting, Rembrandt lighting, rim light
  • Film: Kodak Portra 400, Fujifilm Velvia, Ilford HP5 black and white

These are not flowery — they are technical instructions the model knows how to interpret.

Rule 7: Iterate with directed edits, not full regenerations

This is where most users waste 70% of their API budget.

Bad workflow:

Generate → not perfect → tweak prompt → regenerate from scratch → composition
changes → cry → repeat 5 times.

Good workflow:

Generate → not perfect → "in this image, change [X] to [Y],
keep everything else identical" → done.

GPT Image 2 supports multi-turn directed editing that preserves the rest of the image. This is its single biggest cost-saver.

Examples of effective directed-edit prompts:

"Change the model's jacket from navy to beige. Keep face,
background, lighting, and pose unchanged."

"Replace the headline text with 'Spring Sale'. Keep all other
text, layout, and styling identical."

"Remove the watermark in the bottom-right corner. Keep
everything else exactly the same."

The phrase "keep everything else identical" is the magic incantation. Don't skip it.

Putting it all together: a complete real-world prompt

Here's a prompt that uses all 7 rules at once. This is for an e-commerce hero image:

A white stainless steel insulated water bottle, standing upright
on a beige linen tablecloth, with soft window light from the left
at 45 degrees, premium minimalist product photography style.

Top-left red rectangular badge reads "Limited 50% Off",
top-right gold circular badge reads "24h Hot/Cold",
below the bottle bold black headline reads "Daily Commute Companion",
bottom-center small text reads "Tap to Shop".

Shot on Sony A7R IV, 50mm f/2.8 lens, shallow depth of field,
clean composition, no other objects in frame, no watermarks,
1:1 aspect ratio.

This kind of prompt typically produces a usable result on the first or second try, instead of the 5-7 retries you'd need with a vague prompt.

Common GPT Image 2 prompt anti-patterns

A short list of things to stop doing immediately:

Anti-patternWhy it failsWhat to do instead
masterpiece, 8k, ultra detailed keyword stuffingNoise to GPT Image 2Use real style anchors (Rule 5)
Single run-on sentence with no commasHard for the model to parse structureUse the 5-component structure (Rule 1)
Describing text in concept (a sale headline)Won't render the right wordsAlways quote the exact string (Rule 2)
Prompts in mixed languages without intentionModel gets confused on which language to renderStay in one language for instructions, quote the target language for on-image text
50-line mega-promptsDiminishing returns past ~15 specificationsCap at 10-15 specs, use directed edits for refinements
No mention of aspect ratioModel defaults varyAlways end with 1:1 / 16:9 / 9:16 aspect ratio

Quick checklist before hitting Generate

Before you submit any GPT Image 2 prompt, run through:

  • Does it have all 5 components (subject, scene, style, text, camera)?
  • Is every piece of on-image text in quotes?
  • Does every element have a specified location?
  • Have I excluded what I don't want?
  • Is the style anchored to a real reference?
  • Are camera and lighting specified (for photo)?
  • Is the aspect ratio at the end?

If all 7 boxes are checked, your hit rate jumps to ~90%.

Want to skip the writing entirely?

If you want pre-written GPT Image 2 prompts you can copy-paste directly, browse gpt-image2.art/explore — every example image has its source prompt visible, organized by use case (e-commerce, social media, character design, photography, infographics, posters).

Further reading

Free to try

Generate your first image with GPT Image 2 — right now

Reliable non-Latin text rendering, directed editing, and 50+ ready-to-use prompts. No downloads — just open in your browser.