jinaai/jina-reranker-m0
Multilingual, multimodal reranker for text and visual documents across 29+ languages via Qwen2-VL backbone
View on HuggingFaceGuide
Overview
jinaai/jina-reranker-m0 is a multilingual, multimodal reranker that ranks visual documents across 29+ languages. It accepts text and visual content, including pages with mixed text, figures, tables, and various layouts.
Deployment target: 2x NVIDIA T4 or 2x NVIDIA L4.
Prerequisites
- Hardware: 2x T4 or 2x L4 (or any 2x GPU with ~16 GB each)
- vLLM >= 0.8.0
Install vLLM (CUDA)
uv pip install vllm
Install vLLM (AMD ROCm)
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700
Launch command
vllm serve jinaai/jina-reranker-m0 \
--host 0.0.0.0 \
--port 8000 \
--tensor_parallel_size 2 \
--gpu-memory-utilization 0.75 \
--max_num_seqs 32
On AMD:
export VLLM_ROCM_USE_AITER=1
vllm serve jinaai/jina-reranker-m0 \
--tensor_parallel_size 2 --gpu-memory-utilization 0.75 --max_num_seqs 32
Rerank API
curl -X POST http://localhost:8000/v1/rerank \
-H "Content-Type: application/json" \
-d '{
"model": "jinaai/jina-reranker-m0",
"query": "What are the health benefits of green tea?",
"documents": [
"Green tea contains antioxidants called catechins...",
"El precio del café ha aumentado un 20% este año...",
"Studies show that drinking green tea regularly..."
],
"top_n": 3,
"return_documents": true
}'
Score API
Text-to-text:
curl -X POST http://localhost:8000/v1/score \
-H "Content-Type: application/json" \
-d '{
"model": "jinaai/jina-reranker-m0",
"text_1": ["What is the capital of Brazil?"],
"text_2": ["The capital of Brazil is Brasilia."]
}'
Multimodal (text vs. images):
{
"model": "jinaai/jina-reranker-m0",
"text_1": "A cat",
"text_2": {
"content": [
{"type": "image_url", "image_url": {"url": "cat_img.jpg"}},
{"type": "image_url", "image_url": {"url": "dog_img.jpg"}}
]
}
}
Offline Deployment
from vllm import LLM
llm = LLM(
model="jinaai/jina-reranker-m0",
tensor_parallel_size=2,
gpu_memory_utilization=0.75,
max_model_len=1024,
max_num_seqs=32,
kv_cache_dtype="fp8",
dtype="bfloat16",
)
res = llm.score("fast recipes for weeknight dinners", [
"A 65-minute pasta with garlic and olive oil.",
"Slow braised short ribs that cook for 5 hours.",
"Stir-fry veggies with pre-cooked rice.",
])
for item in res:
print(item.outputs.score)