Synthetic Aperture Radar (SAR) provides robust all-weather imaging capabilities; however, translating SAR observations into photo-realistic optical images remains a fundamentally ill-posed problem. Current approaches are often hindered by the inherent speckle noise and geometric distortions of SAR data, which frequently result in semantic misinterpretation, ambiguous texture synthesis, and structural hallucinations.
To address these limitations, we propose OSCAR (Optical-aware Semantic Control for Aleatoric Refinement), a novel SAR-to-Optical (S2O) translation framework that integrates three core technical contributions:
Extensive experiments demonstrate that OSCAR achieves superior perceptual quality and semantic consistency compared to state-of-the-art approaches.
The Optical-Aware SAR Encoder bridges the fundamental cross-modal gap by aligning SAR features with a rich optical semantic manifold.
The Semantically-Grounded ControlNet performs the translation by injecting dual-path semantic guidance into the diffusion process.
We conduct a quantitative comparison with state-of-the-art SAR-to-optical translation methods on both BENv2 and SEN12MS datasets. Our OSCAR framework establishes a new state-of-the-art across all metric categories.
We analyze the individual contributions of our three core technical pillars:
The integration of all components (Full Model) achieves the highest quantitative performance across DISTS, LPIPS, and SAM metrics, demonstrating a clear synergy between spatial and global guidance.
The qualitative results visualize how each component contributes to the final synthesis. As Cross-modal Alignment, Hierarchical Visual Prompts, and Class-aware Text Prompts are progressively integrated, we observe:
Ultimately, the Full Model (v) produces the most photorealistic results with accurate color distributions and the sharpest boundaries compared to any other configuration.