vLLM/Recipes
Wan AI

Wan-AI/Wan2.2-T2V-A14B-Diffusers

Wan2.2 video generation models — T2V/I2V MoE (14B active) and unified TI2V (5B dense), served via vLLM-Omni

View on HuggingFace
moe28B / 14B0 ctxvLLM 0.12.0+omni
Guide

Overview

Wan2.2 is a video generation family served via vLLM-Omni with optional Cache-DiT acceleration:

  • Wan-AI/Wan2.2-T2V-A14B-Diffusers — Text-to-Video (MoE, 14B active)
  • Wan-AI/Wan2.2-I2V-A14B-Diffusers — Image-to-Video (MoE, 14B active)
  • Wan-AI/Wan2.2-TI2V-5B-Diffusers — Unified Text+Image-to-Video (dense 5B)

Prerequisites

  • vLLM-Omni on top of vLLM 0.12.0
  • diffusers (bundled in vLLM-Omni CLI scripts)

Installation

uv venv
source .venv/bin/activate
uv pip install vllm==0.12.0
uv pip install git+https://github.com/vllm-project/vllm-omni.git@ef01223c42be10ee260b9f6e5ec31894cd09d86e

Text-to-Video (T2V)

from vllm_omni.entrypoints.omni import Omni

omni = Omni(model="Wan-AI/Wan2.2-T2V-A14B-Diffusers")
frames = omni.generate(
    "Two anthropomorphic cats in comfy boxing gear fight on a spotlighted stage.",
    height=720, width=1280,
    num_frames=81,
    num_inference_steps=40,
    guidance_scale=4.0,
)

CLI:

python examples/offline_inference/text_to_video/text_to_video.py \
  --model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
  --prompt "A serene lakeside sunrise with mist over the water." \
  --height 720 --width 1280 \
  --num_frames 81 --num_inference_steps 40 \
  --guidance_scale 4.0 --fps 24 \
  --output t2v_output.mp4

Image-to-Video (I2V)

import PIL.Image
from vllm_omni.entrypoints.omni import Omni

omni = Omni(model="Wan-AI/Wan2.2-I2V-A14B-Diffusers")
image = PIL.Image.open("input.jpg").convert("RGB")

frames = omni.generate(
    "A cat playing with yarn",
    pil_image=image,
    height=480, width=832,
    num_frames=81,
    num_inference_steps=50,
    guidance_scale=5.0,
)

TI2V CLI:

python examples/offline_inference/image_to_video/image_to_video.py \
  --model Wan-AI/Wan2.2-TI2V-5B-Diffusers \
  --image input.jpg --prompt "A cat playing with yarn" \
  --num_frames 81 --num_inference_steps 50 \
  --guidance_scale 5.0 --fps 16 --output ti2v_output.mp4

Cache-DiT Acceleration

omni = Omni(
    model="Wan-AI/Wan2.2-T2V-A14B-Diffusers",
    cache_backend="cache_dit",
    cache_config={
        "Fn_compute_blocks": 8,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 4,
        "residual_diff_threshold": 0.12,
    },
)

Key Parameters

ParameterDefaultDescription
height720 (T2V) / auto (I2V)Video height (multiples of 16)
width1280 (T2V) / auto (I2V)Video width (multiples of 16)
num_frames81Frames to generate
num_inference_steps40–50Denoising steps
guidance_scale4.0–5.0Classifier-free guidance scale
boundary_ratio0.875MoE boundary split ratio
flow_shift5.0 (720p) / 12.0 (480p)Scheduler flow shift

References