End-to-End Fluent Speech Transcription using Hidden Unit BERT
Using raw audio and pretrained encoders for robust, end-to-end speech transcription.
Variations and Relaxations of Normalizing Flows
This paper covers model classes that emerge from relaxing invertibility contraints in normalizing flows, and explores their relationship to VAEs, score-based diffusion and the broader family of generative models.
Multi-Modal Inductive Graph Learning with Zillow
Learning connections in large, multimodal graphs (text+image) using CLIP priors.