10 Image Prompting Mistakes—and how to avoid them
This post is brought to you by the AI Collaborative. Get free generative AI resources and community by signing up here.
Writing prompts for AI image generators such as Stable Diffusion, Flux, or Midjourney is far more of an art than a science. Unlike LLMs, which are trained to follow instructions and reason through a response, image models are still a bit of an enigma when it comes to finding the perfect prompt.
I’m all for throwing word salad at the wall and seeing what sticks, but in my workshops and coaching, I’ve come across several common mistakes that can affect your ability to generate the image you’re imagining.
In this article, I’ll walk through some of those mistakes, and offer some suggestions on how to avoid them.
1. You’re prompting like you’re talking to a chatbot.
I know you’ve been told that you should say “please” when you’re interacting with AI, either to get better results or to make sure that our future robot overlords destroy you last when they inevitably take over the world.
That’s all well and good if you’re interacting with ChatGPT. It’s counter-productive when you’re prompting an image model. Lanugage such as “Please create an image of…” or “the image should contain elements such as…” is unnecessary and distracts from the descriptive elements that the model can act on.
Instead, just describe the details of the image that you want the model to produce. See my Prompt Pyramid method for structuring effective image prompts.
Of course, if you’re using a model such as DALL-E 3 or Ideogram’s Magic Prompt, you are talking to a chatbot, which is then writing the prompt for the image model.
2. Your prompts are too long.
Each image model can only process a finite amount of information. There are two areas where prompt length will cause problems—first in the text encoder and second in the model itself.
The text encoder is responsible for converting your human language into an embedding that the model can understand. Rather than words, models understand concepts in discrete units called tokens.
Each generation and family of image models uses a different set of text encoders, and each of those text encoders has a hard limit on the number of tokens it can process in a single embedding.
- Flux: T5 encoder handles 512 tokens (~375 words), while the secondary CLIP-L encoder handles 77 tokens (~50 words). Keep the most important features at the beginning of the prompt.
- Stable Diffusion 3.5: SD 3.5’s implementation of T5 can encode 256 tokens (~200 words), while the CLIP-L and CLIP-G encoders can handle 77 tokens (~50 words).
- Older Stable Diffusion models: SD 1.5-based models use CLIP-L, while SDXL models use CLIP-G and CLIP-L. Both of these encoders handle 77 tokens (~50 words).
There are ways to split your prompt and generate embeddings for smaller chunks at a time, but long story short, if your prompt is longer than the text encoder can handle, anything after the cutoff will be ignored. Be descriptive, but be efficient, and make sure the most important concepts come at the beginning of your prompt.
The second area where length will work against you is in the sheer volume of details that you try to cram into a prompt. When the model generates your image, it looks at all of the concepts in your prompt and takes one shot at what that image might look like. Even if the concepts can all be encoded, the model can only do so much to include them.
3. You’re indecisive.
I see endless prompts for “generate an image with this or maybe that.” On the whole, an image model can’t follow instructions. By including both this and that in the prompt, you’re telling the model that you want it to produce both, not one or the other.
Be decisive and clear about what you want to see in the image. If you’re not sure whether you want the model to include one element or another, pick one, and then generate another image with the other option.
4. Your prompts contradict themselves.
No joke, I saw a prompt not too long that was asking the model to produce an image of “a bald man with yellow hair.”
If you contradict yourself in a prompt, the model will still generate an image. It just might not be exactly what you were going for.
You can use contradictions strategically to balance out concepts in a prompt, in which case, go for it. Otherwise, keep an eye out for clashing concepts in your prompt.
5. Your prompts aren’t written for the model you’re using.
Every image model is trained on images with captions that describe what is in the image. As the model is trained, it learns the relationships between features in the caption and the image, so that when it’s time to generate an image from a text prompt, the model can recall what it learned.
The most effective prompts for image models are those that reflect the captions on which the model was trained.
Back in the days of Stable Diffusion 1.5, image datasets were largely captioned by hand, scraped from the internet, or captioned with a predictive tagging tool. Prompts were just a jumbled array of keywords and phrases that might be associated with the style of image we want, and the model often didn’t capture the nuance of what the user was going for.
As models have matured, however, so has the role of captions in the training process. Today, images are being captioned automagically with LLMs, leading to richer and more accurate captions.
SD 1.5/SDXL-based models are typically captioned with tag based prompting. Succinct, get to the point, tell the model what you want.
SD 3.5/Flux/DALL-E models are typically captioned with a combination of human written captions and synthetic captions written by LLMs. They tend to be more nuanced and written in natural, descriptive language.
When in doubt, check the model card and note how the sample images have been prompted for, then emulate that prompting style. My Image Prompt Generator custom GPT has been instructed on how to prompt for various models to eliminate the guesswork.
6. You’re writing poetry, not an image prompt.
LLMs are particularly prone to writing image prompts that have more abstract, emotional language than actual descriptive details. “The image evokes a sense of curiosity and wonder, drawing the viewer in…”
That’s great for writing your next Harlequin romance novel, not so much for prompting an image model. As we covered above, your image prompts will be most effective if you prompt the image model like it was trained.
Effective image prompts tend to be clear, concise, and focused almost entirely on the visual elements of the scene. Write your prompt as if you were describing the scene to someone who couldn’t see.
Sure, concepts such as “curiosity” or “peace” may have a desirable effect on the overall image, but those details should be concise and relegated to the end of the prompt after the important details are out of the way.
Refer back to my Prompt Pyramid method for ideas on how to incorporate these details.
7. You’re using negative prompts incorrectly.
Out of the box, most image models have some sort of implementation for negative prompts—prompts that describe to the model what you don’t want. Some models—such as current versions of Flux—don’t support negative prompts.
There are three areas where you may use negative prompts incorrectly.
- You’re describing what you don’t want in the main prompt, such as “An image of a cute puppy dog with no collar.” Image models don’t follow instructions, so it will just see “collar” in the prompt and assume you want the puppy to be wearing a collar.
- You’re using the wrong syntax. I’ve seen users try to include operators like -something -another thing in their prompts. Generally speaking, models that support a negative prompt will have a separate field. This typically doesn’t work unless whatever app you’re using to generate your images has built functionality to support that. (An exception is Midjourney, which uses the syntax --no something to delineate the negative prompt.)
- You’re trying to use a negative prompt with a model that doesn’t support it. Flux, for example, doesn’t use the standard implementation of CFG (classifier free guidance) that is required for negative prompts, and therefore negative prompts don’t work with Flux models.
If the model you’re using does support negative prompts, they can be a valuable tool, but you need to understand how they work in order to use them effectively.
Negative prompts aren’t firm instructions for the model to follow. Negative prompts are more of a nudge that guides the model away from the concepts in your negative prompt.
If the subject of your prompt is more strongly correlated with a concept than your negative prompt—such as trying to prompt for Tom Selleck without his iconic mustache—it may still win out in the tug-of-war between the two.
8. You’re using the wrong syntax.
Every image generator has its own syntax—the markup that you add around your prompt to make the model behave differently. This isn’t a feature of the image model itself; the markup is parsed by the front end/UI and it makes the adjustments before sending the request to the model.
For example, Midjourney uses --no something for a negative prompt, or a double-colon syntax to separate concepts or to give more weight to a part of the prompt (e.g. a hot:: dog::2).
By contrast, many Stable Diffusion UIs use the syntax (hot:2) dog for prompt weighting. Many image models—particularly commercial models that you access online—don’t support weighting at all.
If you see anything other than text in an image prompt you encounter online, make sure that the image generator you’re using supports that syntax. If it doesn’t, the model will still generate an image; it will just consider the unsupported syntax to be part of the prompt.
9. You’re relying on prompting alone.
When it comes to the overall creative pipeline, prompting may be the most trivial task for executing our creative vision with image models. I know, that upsets some people who fancy themselves “prompt engineers,” it’s true.
I often see it play out like this:
- A user generates an image. It may be close, but not exactly what they’re going for.
- They change up their prompt a bit and generate a new image. This one gets some of the details right, but now it’s missing others.
- This continues for the next dozen or so images, and then they give up because “this image model is the worst.”
Here’s the thing. Prompting alone often isn’t sufficient for creating the image you’re envisioning. There is a whole suite of tools to help you produce a final image, and many of them offer more control than prompting alone.
The secret to producing stunning images with AI image models is to use all of the tools available to you to emphasize the strengths of what AI models can do, and to mitigate their weaknesses.
Here’s an overview of some of the tools that you may want to consider integrating into your creative workflow.
- Inpainting—masking and regenerating a portion of the image with a new, specific prompt.
- Character references—uploading an image of your character to help the model produce consistent characters across generations.
- Style references—same as character reference, but this time for style. Midjourney’s moodboards feature is one of the most underrated tools for guiding style.
- LoRA models—a LoRA is a low rank adaptation model, a tiny model trained to produce a specific character or style. You can train your own LoRAs with online tools such as Fal or Civitai.
- ControlNet—a set of tools that can guide the model’s output based on a sketch, a depth map, edge detection, or other techniques.
- Image-to-image—using a base image to guide the general tone and structure of the generated image.
- Edit/Instruct models—models trained to make adjustments to an image based on instructions in a text prompt.
You can see how I use several of these tools together to create a final scene in the video below.
10. You’re not experimenting.
At the end of the day, these are a bunch of suggestions that I’ve gleaned after generating hundreds of thousands of images with AI image models.
Prompting an image model is more of an art than a science. Sometimes totally random and seemingly contradictory concepts can blend together to guide the image to something really unique.
Follow the guidelines above, but let them be just that—guidelines. Part of the fun of using AI tools in the creative process comes from pushing them to do things they weren’t designed to do, or using them in unorthodox and creative ways.
This post was brought to you by the A.I. Collaborative.