Part of 800+ curated AI tools on AISO
Mixtral 8x22B logo

Mixtral 8x22B

Mistral's largest open-weights MoE — 141B total / 39B active, Apache 2.0

0
freemiumDR 86Open weights under Apache 2.0 — free to download and self-host from Hugging Face (mistralai/Mixtral-8x22B-Instruct-v0.1). API access via Mistral La Plateforme: ~$2/M input tokens, ~$6/M output tokens at launch. Self-hosting requires multi-GPU infrastructure (~4× A100 80GB minimum).View full pricing →

Visit Mixtral 8x22B

https://mistral.ai/news/mixtral-8x22b

About Mixtral 8x22B

Mistral AI's largest open-weights mixture-of-experts model, released April 17, 2024. Mixtral 8x22B uses a sparse MoE architecture with 141B total parameters and ~39B active per token (8 groups of 22B, routing 2 experts per token). At launch it was the strongest open-weight model on reasoning, math, and coding benchmarks — outperforming LLaMA 3 70B and GPT-3.5 Turbo on most tasks. Supports 64k token context, natively multilingual (English, French, German, Italian, Spanish), with function calling and JSON mode. Weights released under Apache 2.0 on Hugging Face.

Key Features

141B total parameters, ~39B active per token (8 expert groups of 22B, 2 routed per token)
64,536 token context window
Function calling and JSON mode support
Multilingual: English, French, German, Italian, Spanish
Apache 2.0 license — free for commercial use, modification, redistribution
State-of-the-art open-weight reasoning at launch — beats LLaMA 3 70B and GPT-3.5 on MATH, HumanEval, MMLU
Efficient inference: ~39B active params means faster throughput than a dense 141B model
Compatible with vLLM, llama.cpp, TGI, Ollama, and other inference frameworks

Mixtral 8x22B Pros & Cons

Pros

  • +Best open-weight performance at launch — top of open-source leaderboards on reasoning, MATH, and coding
  • +MoE efficiency: 141B total params with only ~39B active means faster tokens-per-second than a comparably-performing dense model
  • +Apache 2.0 license — fully permissive, fine-tune and deploy without restrictions
  • +64k context window covers long-document tasks most 2024 open models couldn't handle
  • +Multilingual out-of-the-box across five European languages

⚠️ Cons

  • Heavy self-hosting requirements: needs ~4× A100 80GB GPUs or equivalent for inference
  • Superseded for most use cases by Mistral Small 4 and newer open-weight models
  • No multimodal / vision support — text-only
  • Commercial API has been deprecated in favor of newer Mistral models on La Plateforme

Who Is Mixtral 8x22B Best For?

👤Teams running self-hosted inference who need the best open-weight text model from the 2024 era
👤Researchers benchmarking or fine-tuning large MoE architectures
👤Multilingual applications across English, French, German, Italian, and Spanish
👤Cost-sensitive deployments where Apache 2.0 licensing and self-hosting beat API pricing

Tags

mistralmixtralllmapiopen-sourceopen-weightsmoemixture-of-expertsself-hostedmultilingualcodingreasoning
🏷️

Is this your tool?

Claim your listing to get a Featured badge, edit your description, and stand out from competitors. All plans include a permanent dofollow backlink to your site.

Claim Now →

📬 Get the best new AI tools delivered weekly

One concise email with fresh launches, trending picks, and featured standouts.

Alternatives to Mixtral 8x22B

View all Mixtral 8x22B alternatives →