InterleaveThinker
Interleaved Generation Agentic Reasoning

InterleaveThinker: Reinforcing Agentic Interleaved Generation

We introduce InterleaveThinker, a multi-agent pipeline that endows existing image generators with interleaved generation capabilities through planning, critique, and step-wise instruction refinement.

Dian Zheng1,2* Harry Lee Manyuan Zhang2† Kaituo Feng1 Zoey Guo3 Ray Zhang1 Hongsheng Li1✉

1CUHK MMLab · 2Meituan · 3CUHK IMIXR

*Work done while Dian Zheng was an intern at Meituan · †Project Leader · ✉Corresponding Author

InterleaveThinker teaser figure
66.3
UEval (base: 18.2)
8.6
CoMM (base: 0/10)
0.73
WISE (base: 0.47/1)
28.9
RISE (base: 13.3)

Abstract

Interleaved generation asks a system to produce coherent text-image sequences across multiple dependent steps, where each visual result must respect both the current instruction and the accumulated history.

We introduce InterleaveThinker, a multi-agent pipeline designed to endow existing image generators with interleaved generation capabilities. A planner agent organizes the image-text input sequence and decomposes it into executable generation steps. A critic agent evaluates generator outputs, identifies deviations, and refines instructions for subsequent generation.

We build dedicated training datasets for planner supervised fine-tuning, critic supervised fine-tuning, and critic reinforcement learning. With GRPO and proposed accuracy and step-wise rewards, InterleaveThinker learns to perform step-aware correction and transfers across multiple image generation backends We further transform our data into real interleaved data with two mode: 1) simple version without reflection. 2) hard version with reflection. Enjoy it!.

Method

InterleaveThinker decouples high-level interleaved reasoning from low-level image synthesis by multi-agents workflow. The agents plan and correct the sequence, while the image generator performs each generation or editing action.

InterleaveThinker method overview

BibTeX

@article{zheng2026interleavethinker, title = {InterleaveThinker: Reinforcing Agentic Interleaved Generation}, author = {Zheng, Dian and Lee, Harry and Zhang, Manyuan and Feng, Kaituo and Guo, Zoey and Zhang, Ray and Li, Hongsheng}, journal = {arXiv preprint arXiv:2606.13679}, year = {2026} }