Glossary

What is mixture of experts (MoE)?

Mixture of experts is a model design that routes each token through only a few of many specialised sub-networks, giving large total capacity at a fraction of the compute per token.

An MoE model contains many 'expert' sub-networks but activates only a small subset for any given token, chosen by a learned router. This sparse activation yields the quality of a very large model while keeping the cost per token closer to a small one.

Several leading open models use the MoE design. Serving them efficiently requires routing-aware inference, since the active experts change token by token.

Best open-source LLMs →

▌ mixture of experts (MoE) — FAQ

What is mixture of experts (MoE)?

Mixture of experts is a model design that routes each token through only a few of many specialised sub-networks, giving large total capacity at a fraction of the compute per token. An MoE model contains many 'expert' sub-networks but activates only a small subset for any given token, chosen by a learned router. This sparse activation yields the quality of a very large model while keeping the cost per token closer to a small one.