Consistent Styles: Midjourney vs Stable Diffusion (2024)
So you want to create visual art from text prompts. There’s a few fundamental questions to ask yourself when evaluating the best AI image generator. As you already know, the notion of a “best” option is always subjective. Still, there are some factors to consider that may help you make the decision.
Both services have their own unique art styles and feature sets to work with. As an open source model, Stable diffusion has made it possible for any company to serve up fine tuned models via API. The neural frames image and animation generator is a prime example of this.
At the beginning of 2024, Midjourney announced a new consistent style feature for V6 that makes it easier to maintain a single character through multiple generations. Stable Diffusion can be configured to do the same. Since the base model requires a host application to run, implementation differs slightly between each program.
For example, in neural frames you can train custom Stable Diffusion image models to achieve consistent styles and objects. We also offer a feature called visual echoes that makes each frame will look similar to the last, while changing in subtle ways to create a smooth visual progression.
Midjourney vs Stable Diffusion overview (2024)
Stable Diffusion models and Midjourney are both rooted in deep learning, but their architecture and technology differ in some important ways. The feature sets have evolved with every update so we’re only going to focus on the most recent set of capabilities.
Getting started with Midjourney V6
Midjourney has been celebrated by beginners for its ease of use. To access it, you can login to their web application or create a free Discord account. Once logged in, MJ offers a big community of creators to interact with. The server’s public channels are a great place to study other people’s prompting techniques and get ideas for your own work.
The creativity really begins when you enter a private chat with the Discord bot. Type the command /imagine to the bot and press enter. Next, type up a text prompts that describe the desired artistic style. On submission, it takes less than a minute to retrieve four ai-generated images. That’s the most basic overview of how the app works.
Here are some exciting and unique customization options that Midjourney offers, that make it competitive with Dalle and Stable Diffusion the others. Each of these is handled with a command.
- Outpainting: Select an image and zoom out by 1x or 2x, without impacting the image size.
- Inpainting: Click the “vary region” button to modify the contents of your image
- Blend: Combine two images and see how they look when fused together in a new picture.
- Aspect ratios: Control the shape and pixel dimensions of your images
- Tune: The Style Tuner lets you fine tune and customize the look of your images.
Review the full list of Midjourney commands.
How to access Stable Diffusion models
There are hundreds of websites and applications running Stable Diffusion models today. Stability AI makes their codebase open source, which is why so many companies have been able to build with them.
When Stability first launched, they didn’t actually have a web app, but they have since published a site called DreamStudio. It’s somewhat user friendly and does create high quality images, but it’s not necessarily the best AI tool on the market. Here’s a look at the program’s core functionality:
- Text prompts: Describe what you want to see. You can also use the negative prompt section to tell it what you don’t want it to include.
- Style selection: Choose from a collection of custom models in different art styles. DreamStudio offers a range of options, from photorealistic to illustrative.
- Inpainting and Outpainting: Highlight a region in or outside of your image using the selector tool. Use text prompts to describe new imagery you’d like to see relative to the original.
- Image dimensions: Choose the precise width and height of your image in pixels
- Image editing: Erase sections of your image and generate new content in its place
- History - Review your existing images.
Compared to Midjourney, DreamStudio may have a better user interface but it’s not necessarily a better AI image generator. The inpainting and outpainting features tend to be extremely different from one another, making it impractical for the workflows and use cases that most people have in mind.
Fortunately, there are other companies like Leonardo.ai and neuralframes.com that leverage Stable Diffusion but a much better user experience. Users can upload their own datasets to train generative AI models on their own images.
Some of the same features Stable Diffusion offers, like the canvas editor, are executed better by Leonardo. Inpainting and outpainting tend to be much more coherent. You can read a full overview of their features here.
Achieving consistent styles and characters
One of the longstanding complaints about AI image generators is that it’s difficult to achieve continuity. How can they act as a workflow tool for artists, when the characters and image styles differ from one generation to the next? The same text prompts can create wildly different results, leading to a lot of wasted time and lost generation credits.
Imagine that you’re trying to create a comic book and want the same character to appear from slide to slide. A text prompt might get you the same art style, but the character would change ever time. Fortunately, both MidJourney and neural frames (powered by Stable Diffusion) have both started offering style consistency. We’ll show you how to do that on each platform now.
Midjourney introduces new “Consistent Styles” parameter
It’s finally possible to achieve a consistent style in Midjourney. The company announced two new parameters at the beginning of 2024 to help users achieve this end. One is called sref and the other is cref. To get the best and most consistent output, we recommend using both parameters.
- Open the Discord bot and click the plus icon to the left of the text area.
- Select the “upload file” option and choose the image from your local computer
- Use the /imagine command to type in a text prompt and describe what you want to see in your image
- Press space at the end of the prompt and use the new —sref parameter (that’s two dashes) followed by a space.
- Drag and drop the image into your prompt area and make sure that the sref text wasn’t cut in half, as this does happen sometimes.
- Add a space after the URL and this time use the —cref parameter.
- Press spacebar again and copy-paste the same URL from above.
- Submit your prompt!
Here’s what the final format will look like in the /imagine prompt field:
{text prompt describing your new image} —sref url —cref url
Remember, the URL we’re referring to is going to be created by Midjourney when you drag and drop the uploaded image into the text field. The app will use machine learning to study the style and character traits of the input image, generating new images based on the prompts you provide.
Stable Diffusion supports custom AI models and fine tuning
One of Stable Diffusion’s strengths is in the ability to train custom AI models for image generation. Companies like neural frames use the “dreambooth” technique to update a base model with just a few images from a given style. This helps users take control over their output, even when they don’t have the right words to describe precisely what they want.
The video above showcases a step by step process for creating one of these custom models. It’s as simple as uploading a collection of around ten pictures, waiting a few minutes, and then using a text prompt to describe what you want to see. You can save these custom models and continue to generate new content with them at any point in the future.
Neural frames is unique in that you can stop once you’ve created an individual picture or you continue on to create a frame-by-frame animation. As an audio-reactive tool, you can change the pace and direction of your video imagery in response to any kind of music.
To maintain style and character consistency over the course of the video, the app includes a feature called Visual Echoes as explained below:
Visual echoes help to preserve the look and feel of your original image as it transforms over the course of the song. Without this parameter, characters will change wildly in a matter of seconds.
Here’s a demo of an AI music video that one of our users created using a custom AI model trained on their own special style:
Visit the neural frames homepage to learn more and sign up for free to begin your own experiments today.
Pricing: Stable Diffusion vs Midjourney
Stable Diffusion is a free and open source model. If you decide to use a web app, the price will vary depending on which service you use. For example, neural frames has a free plan while the paid subscriptions start at $19-99/month.
Midjourney costs between $8-120/month with optional annual discounts. The more you pay, the faster GPU time you receive. Users on the pro plan enjoy near real-time upscaling, meaning you can improve the image quality fast.
Timeline: Midjourney, Stable Diffusion, and Dalle
AI art generators are still very much in their infancy. The trend began with Open AI’s Dall-e model, released a full year before Midjourney and Stable Diffusion. Here’s a quick look at the timeline for version releases across all three models respectively:
- January 5, 2021: Dall-e 1
- February 2022: Midjourney V1
- April 6, 2022: DALL-E 2
- April 12, 2022: Midjourney V2
- July 25, 2022: Midjourney V3
- August 22, 2022: Stable Diffusion 1
- November 5, 2022: Midjourney V4
- November 22, 2022: Stable Diffusion 2
- March 15, 2023: Midjourney V5
- May 3, 2023: Midjourney V5.1
- June 22, 2023: Midjourney V5.2
- October 15 2023: Dall-e 3
- December 21, 2023: Midjourney V6 alpha
That brings us up through current day. As you can see from the release cycle, Midjourney has been been the most aggressive with their model updates. That doesn’t necessarily make it better or more advanced than the others. You’ll need to dig into the features of each company to decide which one is the best fit for you.
Info
No VC money, just a small team in love with text-to-video. Contact our team here: help@neuralframes.com. Our AI music video generation is inspired by the open-source Deforum algorithm, but doesn't actually use it. For inspiration on prompts, we recommend Civitai.