Mistral Medium 3.5 logoMistral Medium 3.5
vs
Mistral Small 4 logoMistral Small 4

Mistral Medium 3.5 vs Mistral Small 4: Which is Better in 2026?

A comprehensive comparison of Mistral Medium 3.5 and Mistral Small 4 covering features, pricing, use cases, and which tool is the right choice for your needs.

⚡ Quick Verdict

Choose Mistral Medium 3.5 if:

  • You want more affordable paid plans (from $1.5/mo)
  • You need 128b dense model (merged: instruction-following + reasoning + coding) or 77.6% on swe-bench verified (beats devstral 2 and qwen3.5 397b a17b)

Choose Mistral Small 4 if:

  • You need a broader feature set (10 features vs 9)
  • You need 119b total parameters, 6b active per token (moe: 128 experts, 4 active) or unified reasoning, vision, and coding in a single model

Mistral Medium 3.5 vs Mistral Small 4: At a Glance

Attribute
Mistral Medium 3.5
Mistral Small 4
Pricing Model
Freemium
Freemium
Starting Price
Starting at $1.5/month
Open weights under Apache 2.0 license — free to download, self-host, fine-tune, and use commercially. Available via Mistral API (Mistral Small tier pricing) and Le Chat (free + Pro plans).
Free Tier
✓ Yes
✓ Yes
Category
llm-apis
llm-apis
Features Count
9 features
10 features
Shared Features
1 features in common

Pricing Comparison: Mistral Medium 3.5 vs Mistral Small 4

Understanding the pricing differences between Mistral Medium 3.5 and Mistral Small 4 is crucial for making the right choice. Here's how their plans compare side by side.

Mistral Medium 3.5 Pricing

API:$1.5/month
Starter$7.5/month
View full Mistral Medium 3.5 pricing →

Mistral Small 4 Pricing

Available via Mistral API (Mistral Small tier pricing) and Le Chat (free + Pro plans).See website
View full Mistral Small 4 pricing →

💡 Pricing takeaway: Both Mistral Medium 3.5 and Mistral Small 4 offer free tiers, making it easy to try before you buy. Compare the specific plans to find the best value for your use case.

Feature-by-Feature Comparison

Here's how every feature from Mistral Medium 3.5 and Mistral Small 4 stacks up. They share 1 features in common.

Feature
Mistral Medium 3.5
Mistral Small 4
128B dense model (merged: instruction-following + reasoning + coding)
256k token context window
77.6% on SWE-Bench Verified (beats Devstral 2 and Qwen3.5 397B A17B)
91.4 on τ³-Telecom (strong agentic capabilities)
Configurable reasoning effort per request
Vision encoder trained from scratch — handles variable image sizes and aspect ratios
Open weights under modified MIT license (self-hostable on 4 GPUs)
Powers Mistral Vibe remote coding agents and Le Chat Work mode
Async cloud coding sessions with GitHub, Linear, Jira, Sentry integrations
119B total parameters, 6B active per token (MoE: 128 experts, 4 active)
Unified reasoning, vision, and coding in a single model
Configurable reasoning effort: reasoning_effort='none' (fast) or 'high' (deep)
Native image input support (text + vision in one model)
Apache 2.0 license — permissive commercial use, no additional restrictions
40% reduction in end-to-end latency vs Mistral Small 3
3× higher throughput vs Mistral Small 3 (throughput-optimized setup)
Beats GPT-OSS 120B on AA LCR and LiveCodeBench with shorter outputs
Runs on vLLM, llama.cpp, SGLang, and Transformers

What Makes Each Tool Unique

🔵 Unique to Mistral Medium 3.5

Features available in Mistral Medium 3.5 but not in Mistral Small 4:

  • 128B dense model (merged: instruction-following + reasoning + coding)
  • 77.6% on SWE-Bench Verified (beats Devstral 2 and Qwen3.5 397B A17B)
  • 91.4 on τ³-Telecom (strong agentic capabilities)
  • Configurable reasoning effort per request
  • Vision encoder trained from scratch — handles variable image sizes and aspect ratios
  • Open weights under modified MIT license (self-hostable on 4 GPUs)
  • Powers Mistral Vibe remote coding agents and Le Chat Work mode
  • Async cloud coding sessions with GitHub, Linear, Jira, Sentry integrations

🟣 Unique to Mistral Small 4

Features available in Mistral Small 4 but not in Mistral Medium 3.5:

  • 119B total parameters, 6B active per token (MoE: 128 experts, 4 active)
  • Unified reasoning, vision, and coding in a single model
  • Configurable reasoning effort: reasoning_effort='none' (fast) or 'high' (deep)
  • Native image input support (text + vision in one model)
  • Apache 2.0 license — permissive commercial use, no additional restrictions
  • 40% reduction in end-to-end latency vs Mistral Small 3
  • 3× higher throughput vs Mistral Small 3 (throughput-optimized setup)
  • Beats GPT-OSS 120B on AA LCR and LiveCodeBench with shorter outputs
  • Runs on vLLM, llama.cpp, SGLang, and Transformers

Use Case Recommendations

Best for: Mistral Medium 3.5

Mistral's first flagship merged model, released May 22, 2026. A dense 128B model with a 256k context window that handles instruction-following, reasoning, and coding in a single set of weights. Available as open weights (modified MIT license) and powers Mistral Vibe remote coding agents and Le Chat's new Work mode. SWE-Bench Verified: 77.6%. API: $1.5/M input, $7.5/M output.

Ideal use cases:

  • Teams or individuals who need 128b dense model (merged: instruction-following + reasoning + coding)
  • Teams or individuals who need 256k token context window
  • Teams or individuals who need 77.6% on swe-bench verified (beats devstral 2 and qwen3.5 397b a17b)
  • Teams or individuals who need 91.4 on τ³-telecom (strong agentic capabilities)
  • Anyone focused on mistral workflows
  • Anyone focused on llm workflows
Try Mistral Medium 3.5

Best for: Mistral Small 4

Mistral's first unified open-source model, released March 16, 2026. A 119B MoE model (6B active parameters per token) that merges reasoning (Magistral), multimodal vision (Pixtral), and agentic coding (Devstral) into a single Apache 2.0 model. 256k context window. 40% faster and 3× higher throughput than Mistral Small 3. Beats GPT-OSS 120B on coding and reasoning benchmarks while generating shorter outputs.

Ideal use cases:

  • Teams or individuals who need 119b total parameters, 6b active per token (moe: 128 experts, 4 active)
  • Teams or individuals who need 256k token context window
  • Teams or individuals who need unified reasoning, vision, and coding in a single model
  • Teams or individuals who need configurable reasoning effort: reasoning_effort='none' (fast) or 'high' (deep)
  • Anyone focused on mistral workflows
  • Anyone focused on llm workflows
Try Mistral Small 4

🔧 Other llm-apis Tools to Consider

Mistral Medium 3.5 and Mistral Small 4 aren't the only options. Here are other popular tools in the same space:

Frequently Asked Questions

Is Mistral Medium 3.5 better than Mistral Small 4?

It depends on your needs. Mistral Medium 3.5 offers 9 key features including 128B dense model (merged: instruction-following + reasoning + coding) and 256k token context window, while Mistral Small 4 provides 10 features including 119B total parameters, 6B active per token (MoE: 128 experts, 4 active) and 256k token context window. Mistral Medium 3.5 uses a freemium model with a free tier, while Mistral Small 4 is freemium with free access available. Choose based on which features and pricing model align with your requirements.

Is Mistral Medium 3.5 cheaper than Mistral Small 4?

Mistral Small 4 doesn't have standard paid plans, while Mistral Medium 3.5 starts at $1.5/month. Both tools offer free tiers, so you can try each before committing. Always check the official websites for the most current pricing.

Can I use Mistral Medium 3.5 and Mistral Small 4 together?

Yes, many users combine Mistral Medium 3.5 and Mistral Small 4 in their workflow. Mistral Medium 3.5 excels at 128b dense model (merged: instruction-following + reasoning + coding), while Mistral Small 4 shines with 119b total parameters, 6b active per token (moe: 128 experts, 4 active). Using both allows you to leverage the strengths of each tool, though this means managing two subscriptions — though free tiers can help manage costs.

What's the main difference between Mistral Medium 3.5 and Mistral Small 4?

While both are llm-apis tools, Mistral Medium 3.5 emphasizes 128b dense model (merged: instruction-following + reasoning + coding), whereas Mistral Small 4 is known for 119b total parameters, 6b active per token (moe: 128 experts, 4 active). The best choice depends on your specific workflow and feature priorities.

Learn More

Related Comparisons