zai-org/GLM-OCR
GLM-OCR image-to-text model with built-in MTP speculative decoding for high-throughput OCR serving
View on HuggingFaceGuide
Overview
GLM-OCR is a vision-language model for end-to-end OCR. It includes built-in Multi-Token Prediction (MTP) layers enabling speculative decoding for higher throughput generation.
Prerequisites
- vLLM version: nightly recommended (or latest stable with MTP support)
- Transformers: >= 5.0.0 (install from source for latest)
Install vLLM
uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto
Or nightly:
uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
uv pip install git+https://github.com/huggingface/transformers.git
Launching the Server
With MTP Speculative Decoding
vllm serve zai-org/GLM-OCR \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 1
Client Usage
OpenAI SDK
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1", timeout=3600)
messages = [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"}},
{"type": "text", "text": "Text Recognition:"}
]
}]
response = client.chat.completions.create(
model="zai-org/GLM-OCR",
messages=messages,
max_tokens=2048,
temperature=0.0,
)
print(response.choices[0].message.content)
cURL
curl -s http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "zai-org/GLM-OCR",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/receipt.png"}},
{"type": "text", "text": "Text Recognition:"}
]
}],
"max_tokens": 2048,
"temperature": 0.0
}'
Troubleshooting
- Greedy sampling recommended: Use
temperature=0.0for optimal OCR accuracy. - Transformers version: Requires
transformers >= 5.0.0.