DeepSeek has released Janus Pro

Janus-Pro is an advanced unified understanding and generation Multimodal Large Language Model (MLLM). Developed by the Chinese company DeepSeek, this cutting-edge framework excels in recognizing and generating images, surpassing prominent models like DALL-E 3 by OpenAI and Stable Diffusion in benchmarks.

Janus-Pro is the improved successor of the original Janus model, introducing optimized training strategies, an expanded dataset, and increased model scale. These enhancements have elevated its performance in tasks such as generating images from textual descriptions and analyzing visual data. In GenEval and DPG-Bench, Janus-Pro outperforms Stable Diffusion 3 Medium (open source) and DALL-E 3 (commercial).

The model is now publicly available on Hugging Face, with its code released under the MIT License and the model itself under the DeepSeek Model License.

Model Summary

Janus-Pro introduces a novel architecture that separates visual encoding to enable seamless multimodal processing. Built on the foundations of DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, it incorporates the SigLIP-L vision encoder, which supports 384 x 384 image inputs. For image generation tasks, it utilizes a tokenizer with a downsampling rate of 16.

Key Highlights:

Unified Architecture: Combines understanding and generation capabilities.

Efficient Vision Encoder: Supports high-resolution image input.

Tokenization for Image Generation: Enhances performance in generating high-quality outputs.

To learn more about its implementation, visit the GitHub Repository.

Performance Benchmarks

Janus-Pro demonstrates superior performance across multiple benchmarks, setting new standards for unified multimodal systems:

Multimodal Understanding: Outperforms previous models in understanding benchmarks, showcasing advanced visual and textual processing capabilities.

Text-to-Image Generation: Excels in generating high-quality visuals, surpassing DALL-E 3 and Stable Diffusion 3 Medium in both open and commercial settings.

These achievements highlight Janus-Pro’s strength in both understanding and generating across diverse modalities.

How to start using Janus Pro?

Getting started with Janus-Pro is straightforward. Access the framework via the official GitHub Repository or explore the model directly on Hugging Face for implementation details and code.

Conclusion

We are excited to see what DeepSeek will bring to the table next, as their innovations continue to push the boundaries of AI and redefine what’s possible in the world of multimodal neural networks. What’s even more remarkable is that these opportunities are affordable for everyone, making advanced AI accessible to a wider audience than ever before.