meituan-longcat/LongCat-Image-Edit

Bilingual (Chinese-English) image editing model from Meituan LongCat, served via vLLM-Omni

Guide

Overview

LongCat-Image-Edit is the image-editing variant of LongCat-Image by Meituan. It supports bilingual (Chinese-English) editing instructions and is served via vLLM-Omni (not standard vLLM). It achieves SOTA performance among open-source image editing models.

Prerequisites

Hardware: 1x GPU with >=40 GB VRAM
vLLM-Omni (runs on top of vLLM 0.12.0)
diffusers (latest from source)
xformers (latest)

Installation

# Clone and install vllm-omni
git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
uv venv
source .venv/bin/activate
uv pip install -e . vllm==0.12.0

# Update xformers to the latest version
uv pip install -U xformers --index-url https://download.pytorch.org/whl/cu128

# Update diffusers to the latest version
git clone https://github.com/huggingface/diffusers.git
cd diffusers
uv pip install -e .

Usage

cd vllm-omni
python3 ./examples/offline_inference/image_to_image/image_edit.py \
    --image qwen_bear.png \
    --prompt "Add a white art board written with colorful text 'vLLM-Omni' on grassland. Add a paintbrush in the bear's hands. Position the bear standing in front of the art board as if painting." \
    --output output_image_edit.png \
    --num_inference_steps 50 \
    --guidance_scale 4.5 \
    --seed 42 \
    --model meituan-longcat/LongCat-Image-Edit \
    --cache_backend cache_dit \
    --cache_dit_max_continuous_cached_steps 2

Overview

Prerequisites

Installation

Usage

References