Implementasi CNN-LSTM untuk Music Captioning
PDF

Keywords

Music-Information Retrieval
Music Captioning
Convolutional Neural Network
Long Short-Term Memory

How to Cite

Diarsyah, M. G., & Setiawan, D. (2024). Implementasi CNN-LSTM untuk Music Captioning. Media Informatika, 23(1), 21–33. https://doi.org/10.37595/mediainfo.v23i1.213

Abstract

Music has become an integral part of human life, extending its influence across various industries. For many, music is considered a necessity. With the rise of neural network technology, Music Information Retrieval (MIR) has gained prominence as a multidisciplinary field focused on processing music information and its applications. One popular approach for music captioning is the multimodal encoder-decoder architecture, which utilizes the CNN-LSTM algorithm. In this study, we develop a model that simultaneously learns from audio and text data. We explore different design choices for modality fusion, including early fusion, late fusion, and hybrid fusion, to assess their impact

https://doi.org/10.37595/mediainfo.v23i1.213
PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright (c) 2024 M. Ghazali Diarsyah, Dhanny Setiawan