A.I. Image Generation: 4 features you might not know about (but you should)

Rob Laughter
5 min readApr 23, 2024

--

This post is brought to you by the A.I. Collaborative. Learn and practice generative A.I. skills in a community of like-minded peers.

If you’ve been around the generative A.I. world for any length of time, you’ve probably generated an image or two. Whether you’re using one of the popular online image generation tools or you’re a mad scientist running Stable Diffusion models on your local machine, image generation has become super accessible for the general public.

There’s a difference, though, between simply generating an image and crafting an image that matches your creative vision, and oftentimes, the gap between the two seems uncrossable. No matter how carefully you craft your prompts, or how many times you generate variations of your image, the image model just doesn’t seem to want to cooperate.

In this post, I’m going to show you four advanced image generation features that will give you much more intentional, fine-grained control over the image generation process so that you can begin to execute on your creative vision with A.I. tools.

1. ControlNet: Ultimate Composition Control

Images generated with various ControlNet preprocessors.

The Gist: ControlNet models let you control the composition of your images by using an input image as a reference.

The Details: Creating images with a specific structure can be a challenge with prompting alone. ControlNet models guide the image generation process by first preprocessing the input image into one of several guidance images, giving the artist detailed control over the overall structure and composition of the final image.

A few of the more common ControlNet models include:

  • Depth to image. Your input image is converted to a depth map, which is then used to constrain the structure of your image.
  • Edge to image (canny, line art). The preprocessor detects the edges in your input image, and then uses the structure to guide the generation process.
  • Pose to image (OpenPose). This preprocessor detects the pose from your input image, and then generates an image with the subject in that pose.
Several images generated from the same source image using the depth ControlNet.

Others include the scribble ControlNet (for generating images from rough sketches), the reference ControlNet (which uses the input image to influence the overall “vibe” of the final image), and the recolor ControlNet, which is used for colorizing black and white images.

ControlNet models can be used to guide the subject only (e.g. pose to image), or can be used to guide the structure of the entire image (e.g. depth, line art).

How to Use: ControlNet is built into most Stable Diffusion web UIs. It’s also available in Leonardo.ai as “image guidance.”

2. IP-Adapter: Style and Character Transfer

Using IP-Adapter in ComfyUI to apply the style of a source image to the output image.

The Gist: IP-Adapter has been described as a “one-click LoRA.” It’s a fast, lightweight, model for applying the style, composition, and/or character from an input image to generated images.

The Details: IP-Adapter works by adjusting the weights of the image model based on the characteristics of an input image. It’s super fast, requires no specialized training, and works very well.

IP-Adapter can transfer several details from the input image, together or independently:

  • Style. Transfers the overall style or vibe from the input image to the generated image.
  • Composition. Transfers the overall composition of the input image, independently of the style. This also transfers some of the details about the character.
  • FaceID. Transfers only the details about the character from the input image to the generated image. FaceID has fairly good results, but may not be a perfect likeness.

When using IP-Adapter, users can adjust the weight, or amount of influence, that each element has on the final generation.

Image generated using Midjourney’s Style Reference feature, which is similar to IP-Adapter

How to Use: Most Stable Diffusion web UIs have IP-Adapter support, either natively or through an extension. For ComfyUI, see IPAdapter plus.

Many online image generation services are starting to support IP-Adapter features. Midjourney recently released style reference and character reference features, which function in the same way. Leonardo.ai also has a style reference feature in the Image Guidance tab.

3. Inpainting: Clean Up the Fine Details

Inpainting with DALL-E’s Edit Selection feature in ChatGPT.

The Gist: Inpainting gives you fine control for correcting or changing small details by regenerating and replacing a specific area of your image.

The Details: We’ve all generated an almost perfect image with one obnoxious detail out of place—whether it’s a misshapen hand, some errant text, or another flaw that we can’t unsee.

Inpainting lets you correct tiny details or create more sophisticated images by adding new features. Simply select the area of the image that you want to adjust and type in a new prompt for the replacement.

Because inpainting works by denoising the selected area and regenerating that part of the image, it helps when the image has existing structure to edit, and some features are easier to add than others. Advanced tools (such as Stable Diffusion web UIs) give the user finer control over the inpainting process.

How to Use: Inpainting is supported in most of the common image generation apps at this point. All Stable Diffusion Web UIs have some level of inpainting support. ChatGPT/DALL-E allows inpainting with the edit selection feature. Midjourney calls it vary region, and Leonardo.ai allows users to inpaint in its canvas editor. Photoshop also includes a generative fill feature, which inpaints your image natively in Photoshop using Adobe’s Firefly model.

4. Transparency: I’m Looking Through You

Comping a transparent images of water glasses (generated with Leonardo) into a scene using Photoshop.

The Gist: Better than a simple background removal, some AI image generators can generate images with actual transparency. Perfect for generating assets to use in more creative compositions.

The Details: Support for generating images with transparency is still growing. As of the time of this writing, the code to use this technique was just released a few weeks ago, but some image generators are already integrating it into their apps.

The difference between generating images with actual transparency and just knocking out a background is that portions of the generated image can be partially opaque, letting the image blend with a larger composition.

How to Use: Transparent image generation is currently supported in Stable Diffusion’s SD Forge and ComfyUI with the Layer Diffusion extension. You can also generate transparent images with Leonardo’s Transparency feature. Just flip the switch when generating an image.

--

--

Rob Laughter

Rob is a creative professional exploring the intersection of technology and creativity. His current muse is generative A.I.