Multimodal image and audio music transcription

Fuente Torres, Carlos de la; Valero-Mas, Jose J.; Castellanos, Francisco J.; Calvo-Zaragoza, Jorge

Multimodal image and audio music transcription

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/119385

Información del item - Informació de l'item - Item information
Título:	Multimodal image and audio music transcription
Autor/es:	Fuente Torres, Carlos de la \| Valero-Mas, Jose J. \| Castellanos, Francisco J. \| Calvo-Zaragoza, Jorge
Grupo/s de investigación o GITE:	Reconocimiento de Formas e Inteligencia Artificial
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave:	Multimodal recognition \| Automatic music transcription \| Optical music recognition and deep learning
Área/s de conocimiento:	Lenguajes y Sistemas Informáticos
Fecha de publicación:	11-nov-2021
Editor:	Springer Nature
Cita bibliográfica:	International Journal of Multimedia Information Retrieval. 2022, 11: 77-84. https://doi.org/10.1007/s13735-021-00221-6
Resumen:	Optical Music Recognition (OMR) and Automatic Music Transcription (AMT) stand for the research fields that aim at obtaining a structured digital representation from sheet music images and acoustic recordings, respectively. While these fields have traditionally evolved independently, the fact that both tasks may share the same output representation poses the question of whether they could be combined in a synergistic manner to exploit the individual transcription advantages depicted by each modality. To evaluate this hypothesis, this paper presents a multimodal framework that combines the predictions from two neural end-to-end OMR and AMT systems by considering a local alignment approach. We assess several experimental scenarios with monophonic music pieces to evaluate our approach under different conditions of the individual transcription systems. In general, the multimodal framework clearly outperforms the single recognition modalities, attaining a relative improvement close to 40% in the best case. Our initial premise is, therefore, validated, thus opening avenues for further research in multimodal OMR-AMT transcription.
Patrocinador/es:	Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research was partially funded by the Spanish “Ministerio de Ciencia e Innovación” through project MultiScore (PID2020-118447RA-I00). The first author acknowledges the support from the Spanish “Ministerio de Educación y Formación Profesional” through grant 20CO1/000966. The second and third authors acknowledge support from the “Programa I+D+i de la Generalitat Valenciana” through grants ACIF/2019/042 and APOSTD/2020/256, respectively.
URI:	http://hdl.handle.net/10045/119385
ISSN:	2192-6611 (Print) \| 2192-662X (Online)
DOI:	10.1007/s13735-021-00221-6
Idioma:	eng
Tipo:	info:eu-repo/semantics/article
Derechos:	© The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Revisión científica:	si
Versión del editor:	https://doi.org/10.1007/s13735-021-00221-6
Aparece en las colecciones:	INV - GRFIA - Artículos de Revistas

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
de-la-Fuente_etal_2022_IntJMultimedInfoRetr.pdf		302,4 kB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo

Este ítem está licenciado bajo Licencia Creative Commons