It’s Time for Artistic Correspondence in Music and Video

Dídac Surís^♭ Carl Vondrick^♭ Bryan Russell^♯ Justin Salamon^♯

^♭Columbia University
^♯Adobe

CVPR 2022

Paper

Captions are available in the video options.

Abstract

We present an approach for recommending a music track for a given video, and vice versa. We model the long-term temporal context of both signals, allowing our model to capture the high-level artistic correspondences between them. Our model learns a strong audiovisual representation that allows us to retrieve videos and music that look and sound natural to humans. On the left we show query video segments with the corresponding retrieved music segments, and on the right we show the opposite retrieval direction. Our model's audiovisual correspondence exploits artistic attributes such as music genre or rhythm.

Paper

PDF

@article{suris_musicforvideo,
  title={It’s Time for Artistic Correspondence in Music and Video},
  author={Sur\'is, D\'idac and Vondrick, Carl and Russell, Bryan and Salamon, Justin},
  journal={Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

Contributions

Model long-term temporal context to capture artistic (not just physical) correspondence between video and music
Use state-of-the-art pretrained visual and music features
InfoNCE loss for training

Poster

Acknowledgements

The webpage template was inspired by this project page.