vLLM/Recipes
PaddlePaddle

PaddlePaddle/PaddleOCR-VL

PaddleOCR-VL (0.9B) — compact vision-language model for document parsing, OCR, tables, formulas, charts

View on HuggingFace
dense0.9B131,072 ctxvLLM 0.11.1+multimodal
Guide

Overview

PaddleOCR-VL is a SOTA resource-efficient model for document parsing. Its core (PaddleOCR-VL-0.9B) combines a NaViT-style dynamic resolution visual encoder with an ERNIE-4.5-0.3B language model, optimized for OCR, tables, formulas, and chart recognition.

Prerequisites

  • Hardware: 1x GPU (small VRAM footprint)
  • vLLM >= 0.11.1 (nightly if not released yet)

Install vLLM

uv venv
source .venv/bin/activate
uv pip install -U vllm --pre \
  --extra-index-url https://wheels.vllm.ai/nightly \
  --extra-index-url https://download.pytorch.org/whl/cu129 \
  --index-strategy unsafe-best-match

Launch command

vllm serve PaddlePaddle/PaddleOCR-VL \
  --trust-remote-code \
  --max-num-batched-tokens 16384 \
  --no-enable-prefix-caching \
  --mm-processor-cache-gb 0

Tip: OCR workloads don't benefit much from prefix caching or image reuse, so disable those to avoid hashing/caching overhead.

Client Usage

Task-specific prompts:

from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1", timeout=3600)

TASKS = {
    "ocr": "OCR:",
    "table": "Table Recognition:",
    "formula": "Formula Recognition:",
    "chart": "Chart Recognition:",
}

response = client.chat.completions.create(
    model="PaddlePaddle/PaddleOCR-VL",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://.../receipt.png"}},
            {"type": "text", "text": TASKS["ocr"]},
        ],
    }],
    temperature=0.0,
)
print(response.choices[0].message.content)

Offline Inference with PP-DocLayoutV2

Use separate venvs for vllm and paddlepaddle to avoid conflicts. If you see "The model PaddleOCR-VL-0.9B does not exist.", add --served-model-name PaddleOCR-VL-0.9B.

uv pip install paddlepaddle-gpu==3.2.1 --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu126/
uv pip install -U "paddleocr[doc-parser]"
uv pip install safetensors
from paddleocr import PaddleOCRVL

pipeline = PaddleOCRVL(
    vl_rec_backend="vllm-server",
    vl_rec_server_url="http://localhost:8000/v1",
    layout_detection_model_name="PP-DocLayoutV2",
    layout_detection_model_dir="/path/to/your/PP-DocLayoutV2/",
)

output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")
for i, res in enumerate(output):
    res.save_to_json(save_path=f"output_{i}.json")
    res.save_to_markdown(save_path=f"output_{i}.md")

References