Captions are available in the video options.
We present an approach for recommending a music track for a given video, and vice versa. We model the long-term temporal context of both signals, allowing our model to capture the high-level artistic correspondences between them. Our model learns a strong audiovisual representation that allows us to retrieve videos and music that look and sound natural to humans. On the left we show query video segments with the corresponding retrieved music segments, and on the right we show the opposite retrieval direction. Our model's audiovisual correspondence exploits artistic attributes such as music genre or rhythm.
The webpage template was inspired by this project page.