Implementasi CNN-LSTM untuk Music Captioning


Music-Information Retrieval
Music Captioning
Convolutional Neural Network
Long Short-Term Memory

How to Cite

Diarsyah, M. G., & Setiawan, D. (2024). Implementasi CNN-LSTM untuk Music Captioning. Media Informatika, 23(1), 21–33.


Music has become an integral part of human life, extending its influence across various industries. For many, music is considered a necessity. With the rise of neural network technology, Music Information Retrieval (MIR) has gained prominence as a multidisciplinary field focused on processing music information and its applications. One popular approach for music captioning is the multimodal encoder-decoder architecture, which utilizes the CNN-LSTM algorithm. In this study, we develop a model that simultaneously learns from audio and text data. We explore different design choices for modality fusion, including early fusion, late fusion, and hybrid fusion, to assess their impact
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright (c) 2024 M. Ghazali Diarsyah, Dhanny Setiawan