An Overview of A.I. Image Generators (January 2024)

Rob Laughter
15 min readJan 18, 2024

--

Over the past year, I’ve created well over 100,000 images with A.I. image models, jumping on new tools and models as they’ve been released to learn where they shine. I’ve learned a lot in that process, and it’s given me a solid understanding of how image models work, and how different models differ.

In this post, I’ll share some of the image generators I use most often, exploring how each tool fits into my creative workflow, and examining the strengths and weaknesses of each one.

We’ll cover:

  • Midjourney.
  • DALL-E 3 (ChatGPT)
  • Leonardo.ai
  • Ideogram
  • Stable Diffusion
  • Honorable Mentions

Like anything else in the world of generative A.I., all of this will change next week. A new foundation model could shake the image generation landscape. A startup could fold. Anything can happen. But as of January 2024, this should give readers a good starting point for exploring image generation.

Midjourney: My “go-to” image generator

Prompt: a cute puppy dog (2x creative upscale)

Midjourney is one of the most advanced generative image pipelines in the wild today. As of this writing, Midjourney is developing V6 of their model.

Bottom Line: Midjourney is my go-to image generator. In my opinion, it consistently produces the best quality images with the most appealing aesthetic. It’s not perfect, but it’s typically where I start.

Midjourney Quick Stats

  • Current Version: V6 alpha
  • Model: Proprietary
  • Access: Discord, Web (coming soon), API (unofficial)
  • Pricing: Starting at $10/month
  • License: Full ownership, commercial use with paid plans

Midjourney Features

Midjourney lets users generate images from text prompts, upscale images up to 2048x2048, expand images by panning left/right or up/down, vary image by region, generate images from image prompts, remix images, and create fine tuned styles with their style tuner.

For help with prompting, the /describe feature can help create prompts from an uploaded image, and the /shorten command can remove unnecessary words from long prompts.

On the web, users can view and categorize their own creations, as well as browsing other users’ image generations. Midjourney is currently working on enhancements to their web interface, including the ability to generate images from their website.

A full list of features can be found in Midjourney’s Quick Start guide.

Midjourney’s Strengths

Of all of the image generators available today, I think that Midjourney has the best level of detail, the widest range of artistic and aesthetic styles, and the best quality for images with a photographic style. Midjourney V6 has exceptional prompt understanding and coherence.

Midjourney’s Weaknesses

Midjourney is currently only available via Discord. Using Midjourney on their Discord server can be chaotic, so I highly recommend that users generate images by sending a direct message to the Midjourney Bot, or inviting the bot to their own server.

That said, as of this writing, Midjourney has made web-based image generation available to users who have generated 10,000 or more images with the service, and they have shared that they’ll be rolling it out to even more users soon. By the time you read this, this point may be moot.

Another weakness of Midjourney is that by default, your images are public. That means that anyone can see the images you create — and copy your prompts. Users on higher subscription tiers can make their images private with stealth mode.

DALL-E: Conversational image generation

Prompt: Create an image of a cute puppy dog, cinematic style, wide format. Rewritten with my Photographic Images custom GPT to “cinematic digital photograph of a cute puppy dog, sitting on a soft, plush blanket, with warm, soft-focus lighting, wide angle, shot on Canon EOS 5D Mark IV with a 24mm f/2.8 lens.”

OpenAI’s DALL-E 2 was one of the first image models to bring A.I. image generation into the mainstream cultural milieu. For many users, it was the first experience they had with generating an image from a simple text prompt. At the time, it was as Arthur C. Clarke wrote in his 1962 book, Profiles of the Future: An Inquiry into the Limits of the Possible, “indistinguishable from magic.”

DALL-E 2 was quickly leapfrogged by more advanced image models, but in October 2023, OpenAI released DALL-E 3, which made significant advances on image quality, resolution, and coherence. To make image generation accessible to the masses, OpenAI integrated DALL-E 3 directly into ChatGPT, eliminating the hocus pocus of writing image prompts by using GPT-4 to craft prompts based on the user’s request.

Bottom Line: Because DALL-E 3 is built into ChatGPT and eliminates the hassle of crafting detailed prompts, I tend to go to ChatGPT when I want to quickly explore new ideas and brainstorm in a back-and-forth conversation.

DALL-E Quick Stats

DALL-E Features

DALL-E doesn’t have a lot of bells and whistles, but that’s part of the appeal. Rather than giving users a full suite of advanced features, DALL-E simplifies image generation to the absolute essentials.

Users can ask ChatGPT to generate an image, and ChatGPT will write an image prompt to generate it with DALL-E 3. Beyond asking for images in square, tall, or wide formats, there isn’t much opportunity for customization.

DALL-E’s Strengths

DALL-E’s core strength is its simplicity. It just works. Whereas other image models require more careful and deliberate prompting, DALL-E uses ChatGPT to write the image prompt. To illustrate this, I prompted ChatGPT to “Create a photo of a cute puppy dog,” and it will generated the following prompt:

An adorable puppy with big, expressive eyes and a fluffy coat. The puppy has a playful demeanor, with one ear perked up and the other slightly drooping. It’s sitting on a soft, grassy field, with a few colorful flowers scattered around. The lighting is soft, creating a warm, inviting scene.

The DALL-E model is one of the most coherent models in terms of picking up nuanced details in prompts. Whereas other models tend to lose or mix up details, DALL-E can often generate compositions that no other model can.

Additionally, while all generative image models currently struggle with text, DALL-E can usually render several words of legible text with some persistence and a few retries.

DALL-E’s Weaknesses

DALL-E’s biggest strength is also its biggest weakness. Many users don’t realize that ChatGPT is rewriting their prompts before sending them to DALL-E, and they wonder why the images aren’t quite following what they ask for.

That would be great if ChatGPT was any good at writing prompts for an image model, but perhaps paradoxically, the A.I. isn’t very good at writing prompts for A.I.

One common issue is when a user asks to change or remove something from the image, and ChatGPT prompts the model to create an image “with no flowers.” Image models interpret language very differently than language models, and they will often see the word “flowers” without the negative context, and they will generate more of that thing.

ChatGPT also can’t “see” the image that DALL-E generates, so it will often confidently state that it has removed or changed something in an image, regardless of whether the image actually reflects that change.

As long as you understand how ChatGPT and DALL-E are working together, you can work around this. I’ve created an Un-Opinionated Image Gen custom GPT to run a user’s prompts verbatim, without any rewrites, for more granular control.

One other limitation of DALL-E is that it struggles with photographic styles, and images generated with the model often have very painterly or illustrated styles, even when prompting for a photograph.

This is in part by design. OpenAI takes responsible use of A.I. very seriously, and they understand that if users can create convincing, realistic images, those images could be used for nefarious purposes. DALL-E’s strict content policies also refuse to create images with the likenesses of specific people such as celebrities or public figures, copyrighted characters, images that emulate a contemporary artist’s style, or images that could be use for deceptive ends.

While content restrictions are enforced on the DALL-E side of the equation, the stylistic preferences of the model can be prompted out, to a certain extent. I created a Photographic Images custom GPT to enhance prompts with photographic styles.

Leonardo.ai: A creative suite for image generation

Prompt: a cute puppy dog (PhotoReal pipeline, cinematic style, Alchemy upscale)

Leonardo is a suite of image generation tools, based on Stable Diffusion. It’s more than a simple image generator — it’s a creative toolkit, presenting the user with a variety of tools to help them execute on their creative vision.

Bottom Line: Leonardo usually isn’t my first choice, but I’ll use it when I need more control over my image generation than Midjourney or DALL-E can offer. Their PhotoReal pipeline is excellent at photographic style images.

Leonardo Quick Stats

  • Current Version: PhotoReal V2, Alchemy V2
  • Model: Stable Diffusion XL, Stable Diffusion 1.5, various fine-tunes
  • Access: Web, App
  • Pricing: Free tier (150 tokens/day). Paid plans start at $12/month.
  • License: Full ownership for paid users, image license for free users, commercial use.

Leonardo Features

Leonardo is one of the most feature-rich image generation apps available today. Based on the highly-extensible Stable Diffusion pipeline, Leonardo gives users a high degree of control over their image generation.

Users can generate images form text prompts or start with an image, choosing from a library of fine tuned variants of Stable Diffusion models in a wide range of styles, and optionally upscaling their image with a 2x A.I. upscaler.

Leonardo’s image guidance goes beyond the typical image-to-image pipeline, giving users finer control over the images they generate. Depth-to-image extracts a depth map from the uploaded image and uses it to conform the generated image to the same structure as the original. Pose-to-image extracts a pose from any people or characters featured and generates a new image in the same pose. And edge-to-image extracts the outline of major features in the input image, using those edges to generate a new image. Users can combine two or more image guidance modes for even more control.

Leonardo also lets users upload a data set and train their own fine-tuned model based on a likeness, character, or style.

Beyond image generation, Leonardo has a number of unique features that aren’t available in most image generators. These features include realtime generation, which generates polished images in real time as the user sketches out an idea, and the realtime canvas, which allows users to piece together a more complex composition by generating one part of the image at a time. The realtime canvas can also be used to refine or replace portions of a generated image.

Leonardo recently added Motion, an image-to-video model that animates static images. While generative video is currently in its early stages, the quality of Leonardo’s Motion module is comparable — and often better—than some of the more established players in the space, such as Runway, Pika, and Stable Video Diffusion.

Leonardo’s Strengths

Leonardo’s greatest strength is the quality of images generated in the Alchemy/PhotoReal pipeline, and the robust feature set that is included at a very manageable price.

Having multiple models to choose from, users can generate images in a wide range of styles without being limited to a specific aesthetic — and with some work, users can fine tune their own model.

The PhotoReal pipeline produces images in a photographic style that rival those generated with Midjourney. Some users even prefer Leonardo’s photographic images to those produced by Midjourney.

The realtime canvas and realtime generation features in Leonardo are also innovative features that you can’t find elsewhere. Personally, I find them to be more fun toys than staples in my daily workflow, but they’re interesting nonetheless.

Leonardo’s Weaknesses

Like other image generators, Leonardo’s strengths also highlight some of its weaknesses. The robust feature set makes for a cluttered workspace and gives users a steeper learning curve to get started.

The PhotoReal pipeline is very, very good at producing realistic images, but it is necessarily somewhat inflexible, trading image quality with less diversity in terms of subject matter, style, and composition. Prompting for specific poses, styles, or features can be difficult, and the image guidance options aren’t compatible with the PhotoReal pipeline.

Lastly, the results of creating personalized fine tuned models are okay, at best, and require some skill and knowledge in curating and preparing datasets.

Ideogram: king of design and typography

Prompt: A social media graphic in vector illustration style, featuring a cute puppy dog and the text “Cute Puppy Dog”

Ideogram made some significant waves when it broke into the image generation scene back in August 2023, not because it produced spectacular images, but because it was able to do something that few models could — produce coherent text. Previously, Stability’s DeepFloyd IF model was the only image model that came close to producing legible text, and it was a cumbersome install that only ran on hardware with high-end graphics cards.

Word on the streets is that Ideogram is built on DeepFloyd, but makes the model more accessible by building it into a front end product.

Bottom Line: If I need to quickly brainstorm designs with text, such as typography, social media graphics, or logotypes, I often turn to Ideogram first. Ideogram is also great at creating illustrations for things like tee shirts, stickers, etc.

Ideogram Quick Stats

  • Current Version: V0.2
  • Model: DeepFloyd (rumored)
  • Access: Web
  • Pricing: Free tier (25 generations/day). Paid plans start at $8/month.
  • License: Full ownership, commercial use okay.

Ideogram Features

Ideogram is a pretty straightforward image generator, eschewing the bells and whistles of services like Leonardo to focus on their motto — “helping people become more creative.”

Community and inspiration are core to Ideogram’s philosophy. Image generation takes place above a public feed of featured images from other users. A simple text field invites users to “Describe what you want to see here,” with a large “Generate” button beside it. Click it, wait a few moments, and you’ll see four images based on your prompt.

There are a few additional options to customize the generated image, including aspect ratio (square, wide, or tall) and a few suggested styles.

After a user generates an image, they can “remix” the image, using it as a starting point for the next generation and choosing how much influence the original will have on the new batch.

Ideogram’s Strengths

Ideogram’s core strength is its ability to create images with coherent text. While no image model generates perfect text 100% of the time — it’s an image model, after all — Ideogram renders text most reliably and creatively, in my experience.

While models such as Midjourney and DALL-E have since been able to render text as well, Ideogram is still a strong contender.

Ideogram also excels at styles such as illustrations and vector art. Browsing the public image feed, you’ll see many users creating things such as tee shirt designs, sticker designs, and social media graphics. Within these styles, Ideogram is an extremely coherent and creative model.

If you’re up for spinning the roulette wheel with copyright infringement, Ideogram can also generate images of specific characters really well. What you do with that information is your business. I would think twice — three times, even—before using such images in the wild.

Ideogram’s Weaknesses

If you’re looking to create images with a photographic or realistic style, you’ll probably have more luck using one of the other image generators in this guide. While Ideogram can technically produce them — and will occasionally surprise me with really interesting images — it isn’t a strength.

Ideogram doesn’t have a huge feature set by design. That can be an asset, but it also means that you’ll likely need to use other tools alongside Ideogram as you’re building out your creative toolbox.

Stable Diffusion: image generation for the mad scientist

Prompt: A candid photo of a cute puppy dog with brown, white, and black fur romping through the tall grass in a sunny park. Trees and shrubs are visible in the background. Flowers dot the ground, with other various park related things. The sun is setting in the background. Captured on a Nikon z7ii with 105mm f/1.8 NIKKOR lens. Dark, contrasty vibe. (Generated in ComfyUI with the ICBINP XL checkpoint, upscaled to 2688x1536.)

Unlike the other options on this list, Stable Diffusion isn’t a service — it’s a base model, developed by Stability AI, that powers many online image generators. While you can use it with any number of web based services, such as Leonardo, Stable Diffusion can also be installed on your local machine.

It’s free to use and the code to run it is open source, which means if you have the hardware to run it, you can generate unlimited images, entirely offline, with no cost. A number of community built apps and UIs are available.

Bottom Line: Stable Diffusion is hands down my favorite tool for generating images. It’s not for the faint of heart, but if you’re willing to roll up your sleeves and tinker, no other image generator gives you the level of control and flexibility.

Stable Diffusion Quick Stats

  • Current Version: SDXL 1.0
  • Model: Open Source
  • Access: Local installation, various services, API
  • License: Full ownership, but commercial use may require a paid membership

Stable Diffusion’s Strengths

Stable Diffusion is highly customizable. Users can choose from several well-supported UIs, thousands of fine-tuned models and styles, and hundreds of extensions to add additional functionality to the image generation pipeline.

Some of the most common web UIs for Stable Diffusion include:

  • ComfyUI. Most customizable, but highest learning curve.
  • Automatic1111. Balances extensibility with accessibility.
  • Fooocus. Prioritizes ease of use over customization.

Thousands of fine tuned models can be found on repositories such as Civitai, along with style and character models called LoRAs that let users further customize generated images.

Extensions include ControlNet for image guidance, Stable Video Diffusion and AnimateDiff for generating short videos, various upscalers, and more.

Because it is built on open source models and software, Stable Diffusion has strong community support, with many professionals and enthusiasts contributing their time to develop new and innovative solutions, making them available to intrepid users far sooner than they will be available in commercial products.

Stable Diffusion’s Weaknesses

Again, the weaknesses here are consequences of Stable Diffusion’s strengths.

If you don’t have any experience with generative A.I., Stable Diffusion probably isn’t the place to start. Installing the software to run the models can be a cumbersome process, requiring at least some knowledge of how to use the command line and how to manage software such as Python.

Running Stable Diffusion also requires ample computing resources, including sufficient memory, VRAM, and hard drive space. I have an RTX 3080 graphics card with 10 GB of VRAM, and I routinely crash when I run out of resources. Each checkpoint requires around 6.5 GB of storage on your hard drive — and that can add up fast.

Living on the cutting edge also means that you’ll frequently run into conflicts, crashes, and bugs.

Honorable Mentions

Prompt: a cute puppy dog (RunwayML, cinematic lens style)

The image generators above represent the options that I use most often, but this list is by no means exhaustive. There are hundreds of options to choose from, with new one popping up every day.

Below are a few players that I didn’t think needed a deep dive, but wanted to give an honorable mention.

Adobe Firefly

Firefly is Adobe’s suite of proprietary image generation tools. It can be used as a standalone image generator on the web, but it also powers many of the generative features that the company is building into their products, such as Photoshop’s Generative Fill and Illustrators text-to-vector.

Whereas many image models are under scrutiny for being trained on copyrighted material, Firefly is unique in the fact that it was trained entirely on licensed content. Adobe has also pioneered Content Credentials, bringing transparency to A.I. generated assets.

Being a part of Adobe’s ecosystem, however, it has strict content guidelines and will become increasingly embedded in Adobe products.

Mage.space

Mage.space is similar to Leonardo in that it’s a web based Stable Diffusion front end, but more simple and easy to use.

Mage.space has a generous free tier, with paid plans that unlock more features.

RunwayML

Runway is an A.I. research company partnered with Stability to train their early image models. Today, they’re best known for their generative video product, Gen-2, but they also offer capable text-to-image generation, as well as a suite of unique generative A.I. tools.

Conclusion

At the end of the day, there are no shortage of options to choose from when it comes to generating images with A.I. The services I’ve shared here are the ones I use every day, and the ones that I recommend to others.

Choosing between them is more of an art than a science, and I often use multiple models in concert with each other in my creative workflow.

With the content in this post, I hope that you’ll start to see that not all image generators are created equal, and each one has its own set of strengths and weaknesses. As you use each one, look for ways to use it as a tool in the creative process, not necessarily an one-click solution.

If you’d like more deep dives like this, or if you’d like hands on coaching and training on how you can use generative A.I. tools like these in your creative process, check out the A.I. Collaborative, a six month cohort where you can learn and experiment in a community of like minded peers.

--

--

Rob Laughter
Rob Laughter

Written by Rob Laughter

Rob is a creative professional exploring the intersection of technology and creativity. His current muse is generative A.I.

No responses yet