Flux.1 models now available for commercial use on Mystic

Black Forest Labs launched their FLUX.1 suite of text-to-image models that define a new state-of-the-art in image detail. All of these models are now available for you to run on the Mystic Model Hub.

Digest time
3 min read
Published
8/16/2024
Tags
Tutorial
Author
Anirudh RaghuramOscar Rovira

Why Flux.1? And how is it better than other text-to-image models?

Black Forest Labs, the creators behind FLUX.1 and pioneers of Stable Diffusion, have launched three variants of the FLUX.1 model suite: Pro, Dev, and Schnell. Each variant is designed to meet different needs, balancing accessibility with the model’s capabilities.

FLUX.1 [pro]: This is the highest-performing version of FLUX.1, offering state-of-the-art image generation. It excels in prompt adherence, visual quality, image detail, and output diversity.
FLUX.1 [dev]: FLUX.1 [dev] is distilled from FLUX.1 [pro] and optimised for efficiency. Despite its smaller size, it maintains similar image quality and prompt adherence.
FLUX.1 [schnell]: FLUX.1 [schnell] is optimised for speed and intended for local development and personal use.

All of the Flux models have state-of-the-art capabilities in prompt following, visual quality, image detail and output diversity, and more. Here’s a few aspects that we are impressed with:

Enhanced Image Quality:

A highly accurate photo-realistic rendering of a bird's eye view of the Forbidden City in Beijing

FLUX.1 outperforms other text-to-image models by producing significantly higher-quality images due to its advanced architecture. The model is built on a hybrid architecture, which combines transformer and diffusion techniques, scaled up to a massive 12 billion parameters. This architecture allows for a better understanding of complex prompts and results in more detailed, realistic outputs.

The flow matching technique is one of the key innovations in FLUX.1. This method improves how the model learns from and generates visual data, particularly in preserving fine details like texture, lighting, and shading. Additionally, optimisations like rotary positional embeddings and parallel diffusion transformers further enhance the model’s ability to generate high-resolution, photorealistic images.

Highly accurate and photo-realistic human anatomy:

A group of high schoolers waving their hands high to pose for a class photo

If you’ve experimented with generating images of people using other text-to-image models like DALL·E or MidJourney, you’re likely familiar with their limitations in producing anatomically accurate results. These models often struggle with rendering human features correctly, particularly complex elements like hands and faces.

FLUX.1 addresses these challenges with significant improvements in generating anatomically accurate human figures, especially human hands - which many previous models struggle with.

Text:

A photorealistic image of a hillside with a large, bold, white replica of the Hollywood sign. Instead of the word ‘Hollywood,’ the sign reads ‘Mystic’ in the same iconic style.

If you’ve ever tried generating images with text using models like DALL·E or MidJourney, you’ve probably noticed how challenging it can be for these models to render text accurately. They often struggle with similar-looking letters, inconsistent font rendering, or completely unreadable text outputs. However, FLUX.1 performs significantly well, with image outputs that contain complex texts. It can handle even tricky words with repeated letters, producing accurate, clean text within the images. This makes it highly effective for designs that require precision in textual elements.

Complex composition:

3 monkeys in a standing in a sideways line, from left to right, the one on the extreme left has its eyes covered with its hands, the one in the middle has its ears covered, and the one on the left is covering its mouth with its hands.

Flux outperforms all other stable diffusion models in following complex instructions about where things should go in an image.

Prompt adherence:

A wide, ultra-HD panorama of the Trisolaran world during the chaotic era. The scene clearly shows three suns in the sky, with one of them close to the horizon, causing intense heat and the ground beginning to crack. In the foreground, Trisolarans are depicted in their dehydrated state, some folded in the floor others melting down, while in the distance, massive structures designed to survive the extreme conditions crumble. The landscape is both alien and hauntingly beautiful, rendered with intricate, otherworldly details.

One of the best features of FLUX.1 is its exceptional prompt adherence. Whether provided with simple or highly detailed prompts, the model consistently generates high-quality images that closely align with the input description. FLUX.1 excels in handling complex prompts, allowing for precise control over the placement and details of objects within the scene with remarkable accuracy.

Run Flux.1 on Mystic now

All the models in the Flux.1 suite of text-to-image models: Pro, Dev, and Schnell are now available to run on the Mystic Model Hub.

Try Flux.1 now

Performance overview: Mixtral Mixture-of-Experts (MoE 8x7B) with vLLM