Review | Notion

Model

端到端 wave-net

Untitled

parameter 两阶段

Transformation network 主要是要建立参数到音频的映射，从而能够使得最后求的loss可微

其训练阶段直接模拟效果器的输入输出即可

Encoder为谷歌的vgg变式vggish专门用于音频的encode，这里是到128维的hidden embedding

Untitled

自监督

Untitled

reference

Untitled

Dataset

Single

MTG-Jamendo 55000 songs with 195 tags

之前的工作有拿这个数据集去训练sim-CLR 作为Musci encoder

<aside> 💡 We present the MTG-Jamendo Dataset, a new open dataset for music auto-tagging. It is built using music available at Jamendo under Creative Commons licenses and tags provided by content uploaders. The dataset contains over 55,000 full audio tracks with 195 tags from genre, instrument, and mood/theme categories. We provide elaborated data splits for researchers and report the performance of a simple baseline approach on five different sets of tags: genre, instrument, mood/theme, top-50, and overall.

</aside>

Model

Dataset

Single

Pair