microsoft/Phi-4-mini-instruct

Microsoft's Phi-4 family of lightweight dense models (mini-instruct, reasoning, multimodal) with 128K context

dense4B131,072 ctxvLLM 0.7.0+textmultimodal

Guide

Overview

The Phi-4 family includes several lightweight, open models from Microsoft. These models can process text and, in some variants, multimodal inputs like images, to generate text outputs. They come with a 128K token context length.

Prerequisites

Hardware: 1x GPU with >=16 GB VRAM (A100, L40S, H100, etc.)
vLLM >= 0.7.0

Install vLLM

uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto

Launch commands

Phi-4-mini-instruct on a single GPU:

vllm serve microsoft/Phi-4-mini-instruct \
  --host 0.0.0.0 \
  --max-model-len 4000

Phi-4-multimodal-instruct (requires --trust-remote-code for LoRA modules):

vllm serve microsoft/Phi-4-multimodal-instruct \
  --host 0.0.0.0 \
  --max-model-len 4000 \
  --trust-remote-code

Client Usage

from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1", timeout=3600)

response = client.chat.completions.create(
    model="microsoft/Phi-4-mini-instruct",
    messages=[{"role": "user", "content": "Write a short story"}],
    temperature=0.0,
)
print(response.choices[0].message.content)

Multimodal (requires Phi-4-multimodal-instruct):

response = client.chat.completions.create(
    model="microsoft/Phi-4-multimodal-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image."},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
        ],
    }],
)

Available Phi-4 Variants

microsoft/Phi-4-mini-instruct — conversational instruction-tuned
microsoft/Phi-4-mini-reasoning — optimized for reasoning
microsoft/Phi-4-reasoning — advanced reasoning
microsoft/Phi-4-multimodal-instruct — multimodal (text + image)

Benchmarking

vllm bench serve \
  --model microsoft/Phi-4-mini-instruct \
  --dataset-name random \
  --random-input-len 2000 \
  --random-output-len 512 \
  --num-prompts 100

Troubleshooting

Multimodal variant fails to load: add --trust-remote-code.