Google/translategemma-27b-it

Lightweight open translation model from Google (based on Gemma 3) supporting 55 languages. Served via the vLLM-optimized Infomaniak-AI checkpoint.

View on HuggingFace

dense27B131,072 ctxvLLM 0.14.1+text

Guide

Overview

TranslateGemma is a family of lightweight, state-of-the-art open translation models from Google, based on the Gemma 3 family. TranslateGemma models handle translation across 55 languages and are small enough to deploy on laptops, desktops, and modest cloud GPU environments.

Original Models

Optimized vLLM Models

Why use the vLLM-optimized models?

The original Google models have compatibility issues with standard inference engines like vLLM. The optimized versions from Infomaniak-AI (detailed changes):

vLLM Compatibility: Originals require custom JSON parameters (source_lang_code/target_lang_code). The optimized version uses string delimiters.
RoPE Simplification: Originals use a complex RoPE configuration for sliding attention. Optimized uses a standard linear RoPE format (factor: 8.0).
EOS Token Fix: Corrects the EOS token from <end_of_turn> to <eos>.

Prerequisites

Docker

docker pull vllm/vllm-openai:v0.14.1-cu130

Deployment Configurations

Verified for both 4B and 27B:

docker run -itd --name google-translategemma-27b-it \
  --ipc=host \
  --network host \
  --shm-size 16G \
  --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:v0.14.1-cu130 \
    Infomaniak-AI/vllm-translategemma-27b-it \
    --served-model-name translategemma-27b-it \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8000

Client Usage

Tips:

Prompt Delimiters: Encode language metadata directly in the content string: <<<source>>>{src_lang}<<<target>>>{tgt_lang}<<<text>>>{text}
Language Codes: ISO 639-1 Alpha-2 (e.g. en, zh) and regional variants (e.g. en_US, zh_CN).
Context Limit: ~2K tokens.

cURL

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "translategemma-27b-it",
    "messages": [{
      "role": "user",
      "content": "<<<source>>>en<<<target>>>zh<<<text>>>We distribute two models for language identification, which can recognize 176 languages."
    }]
  }'