Discrete Flow Matching
Kaist Visual AI Group
Overview
This project investigates the core differences between Continuous and Discrete Flow Matching for generative modeling. Developed during an internship at KAIST Visual AI Group, the work focuses on building a testbed to compare these approaches on discrete datasets including MNIST and sketch data.
Key Research Question
Why use flow matching over autoregressive transformers for sketch generation?
Why use flow matching over autoregressive transformers for sketch generation?
Motivation
| Autoregressive Transformers | Flow Matching |
|---|---|
| Sequential token generation | Parallel generation |
| Cannot revise past decisions | Iterative refinement |
| Limited task flexibility | Flexible conditioning |
Approach
Dual Modality Architecture
- CNN encoder for rendered sketch images (UDF representation)
- Transformer encoder for stroke sequences
Key Components
- VQ-VAE: Discrete codebook learning for structured representations
- CFG: Classifier-Free Guidance adapted for discrete state-space models
- MDM vs DFM: Ablation study comparing Masked Diffusion and Discrete Flow Matching
Target Tasks
- Stroke infilling
- Layout generation with given strokes
- Sketch generation with given layout
Datasets
| Dataset | Description |
|---|---|
| MNIST | 32x32 grayscale handwritten digits |
| QuickDraw | Simple stroke-based sketches |
| Creative Sketch | Complex multi-stroke sketches with labels |