tencent/HunyuanOCR
Tencent Hunyuan end-to-end OCR expert VLM (~1B) for online OCR serving with an OpenAI-compatible API
View on HuggingFaceGuide
Overview
HunyuanOCR is a leading end-to-end OCR expert VLM powered by Hunyuan's native multimodal architecture. This recipe covers online serving with the OpenAI-compatible API.
Prerequisites
- vLLM version: latest stable
- Hardware: single GPU (1B model)
Install vLLM
uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto
Launching the Server
vllm serve tencent/HunyuanOCR \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0
Configuration Tips
- Use greedy sampling (
temperature=0.0) or low temperature for optimal OCR accuracy. - OCR tasks generally do not benefit from prefix caching or image reuse; disabling them (as above) removes hashing/caching overhead.
- Adjust
--max-num-batched-tokensfor throughput based on your hardware.
Client Usage
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1", timeout=3600)
messages = [
{"role": "system", "content": ""},
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/tools-dark.png"}},
{
"type": "text",
"text": (
"Extract all information from the main body of the document image "
"and represent it in markdown format, ignoring headers and footers. "
"Tables should be expressed in HTML format, formulas in the document "
"should be represented using LaTeX format, and the parsing should be "
"organized according to the reading order."
)
}
]
}
]
response = client.chat.completions.create(
model="tencent/HunyuanOCR",
messages=messages,
temperature=0.0,
extra_body={"top_k": 1, "repetition_penalty": 1.0},
)
print(response.choices[0].message.content)
Troubleshooting
- Accuracy: Use
temperature=0.0andtop_k=1for deterministic OCR output. - Application-oriented prompts: See the official model card for prompts tuned to various document parsing tasks.