Research

Fusion is not one-size-fits-all: Cross-Modal Representation Alignment for Time-to-Event Modeling

Zac Boring June 16, 2026 1 min read

Accurate time-to-event (TTE) prediction from multimodal clinical data remains challenging due to modality imbalance and distribution shift. We introduce a foundation model-driven framework for cross-modal representation alignment between CT imaging and longitudinal EHR data, designed to generalize across tasks and institutions. CT and EHR modalities are encoded independently using domain-specific foundation models and aligned in a shared latent spa

By Zhemin Zhang, Weijie Chen, David Le, Amara Tariq, Alex Wallace, Matthew Stib, Juan Maria Farina, Cha

Read the full article at ArXiv cs.AI →