HuggingFace Papers Jul 2, 2026

ELDR: Expert-Locality-Aware Decode Routing for PD-Disaggregated MoE Serving

What happened

ELDR (Expert-Locality-Aware Decode Routing) is a routing system designed for prefill-decode disaggregated Mixture-of-Experts (MoE) serving. It improves inference performance by predicting expert activations and routing decode requests to maximize expert locality, reducing communication overhead.

Why it matters

It addresses the core network and memory bottlenecks of serving large MoE models in production.

The take

As MoEs become the standard architecture, serving optimization is shifting from general LLM scheduling to expert-level routing. Prefill-decode disaggregation is already standard in high-throughput setups; ELDR optimizes the network bottleneck inherent in distributed MoE decodes.

Do this

Infrastructure engineers hosting open-weights MoEs (like Mixtral or DeepSeek) should monitor ELDR's routing strategies for integration into custom serving stacks.

Read the source →

Don't read this site daily. Get it in your inbox.

The daily brief and Sunday deep dive — distilled, scored, and opinionated. For builders only.