english Icono del idioma   español Icono del idioma  

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12008/43537 Cómo citar
Registro completo de metadatos
Campo DC Valor Lengua/Idioma
dc.contributor.authorZinemanas, Pabloes
dc.contributor.authorArias, Pabloes
dc.contributor.authorHaro, Gloriaes
dc.contributor.authorGomez, Emiliaes
dc.date.accessioned2024-04-16T16:21:16Z-
dc.date.available2024-04-16T16:21:16Z-
dc.date.issued2017es
dc.date.submitted20240416es
dc.identifier.citationGómez, E, Arias, P, Zinemanas, P, Haro, G. "Visual music transcription of clarinet video recordings trained with audio-based labelled data" Publicado en: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), Venicia, Italia, 22-29 oct, 2017, pp. 463-470, doi: 10.1109/ICCVW.2017.62.es
dc.identifier.urihttps://hdl.handle.net/20.500.12008/43537-
dc.descriptionTrabajo presentado en el International Conference on Computer Vision Workshops (ICCVW), Venicia Italia, 22-29 oct., 2017.es
dc.description.abstractAutomatic transcription is a well-known task in the music information retrieval (MIR) domain, and consists on the computation of a symbolic music representation (e.g. MIDI) from an audio recording. In this work, we address the automatic transcription of video recordings when the audio modality is missing or it does not have enough quality, and thus analyze the visual information. We focus on the clarinet which is played by opening/closing a set of holes and keys. We propose a method for automatic visual note estimation by detecting the fingertips of the player and measuring their displacement with respect to the holes and keys of the clarinet. To this aim, we track the clarinet and determine its position on every frame. The relative positions of the fingertips are used as features of a machine learning algorithm trained for note pitch classification. For that purpose, a dataset is built in a semiautomatic way by estimating pitch information from audio signals in an existing collection of 4.5 hours of video recordings from six different songs performed by nine different players. Our results confirm the difficulty of performing visual vs audio automatic transcription mainly due to motion blur and occlusions that cannot be solved with a single viewes
dc.languageenes
dc.rightsLas obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad De La República. (Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)es
dc.subjectVisualizationes
dc.subjectKalman filterses
dc.subjectFeature extractiones
dc.subjectInstrumentses
dc.subjectVideo recordinges
dc.subject.otherProcesamiento de Señaleses
dc.titleVisual music transcription of clarinet video recordings trained with audio-based labelled dataes
dc.typePonenciaes
dc.rights.licenceLicencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)es
udelar.academic.departmentProcesamiento de Señales-
udelar.investigation.groupProcesamiento de Audio-
Aparece en las colecciones: Publicaciones académicas y científicas - Instituto de Ingeniería Eléctrica

Ficheros en este ítem:
Fichero Descripción Tamaño Formato   
ZAHG17.pdf1,52 MBAdobe PDFVisualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons