An MoE model contains many 'expert' sub-networks but activates only a small subset for any given token, chosen by a learned router. This sparse activation yields the quality of a very large model while keeping the cost per token closer to a small one.
Several leading open models use the MoE design. Serving them efficiently requires routing-aware inference, since the active experts change token by token.