Por favor, use este identificador para citar o enlazar este ítem:
https://hdl.handle.net/20.500.12008/55007
Cómo citar
| Título: | Evaluating disentangled representations for controllable music generation |
| Autor: | Ibáñez-Martínez, Laura Nkama, Chukwuemeka Poltronieri, Andrea Serra, Xavier Rocamora, Martín |
| Tipo: | Preprint |
| Palabras clave: | Disentangled representations, Controllable music generation, Evaluation framework |
| Fecha de publicación: | 2026 |
| Resumen: | Recent approaches in music generation rely on disentangled representations, often labeled as structure and timbre or local and global, to enable controllable synthesis. Yet the underlying properties of these embeddings remain underexplored. In this work, we evaluate such disentangled representations in a set of music audio models for controllable generation using a probing-based framework that goes beyond standard downstream tasks. The selected models reflect diverse un-supervised disentanglement strategies, including inductive biases, data augmentations, adversarial objectives, and staged training procedures. We further isolate specific strategies to analyze their effect. Our analysis spans four key axes: informativeness, equivariance, invariance, and disentanglement, which are assessed across datasets, tasks, and controlled transformations. Our findings reveal inconsistencies between intended and actual semantics of the embeddings, suggesting that current strategies fall short of producing truly disentangled representations, and prompting a re-examination of how controllability is approached in music generation. |
| Financiadores: | Este trabajo ha recibido el apoyo de IA y Música : Cátedra en Inteligencia Artificial y Música (TSI-100929-2023-1), financiado por la Secretaría de Estado de Digitalización e Inteligencia Artificial y la Unión Europea (Next Generation EU), e IMPA : Multimodal AI for Audio Processing (PID2023-152250OB-I00), financiado por el Ministerio de Ciencia, Innovación y Universidades del Gobierno español, la Agencia Estatal de Investigación (AEI) y cofinanciado por la Unión Europea. |
| Citación: | Ibáñez-Martínez, L., Nkama, C., Poltronieri, A. y otros. Evaluating disentangled representations for controllable music generation [Preprint]. Publicado en: CASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 03-08 may. 2026, pp. 15092-15096. DOI: 10.1109/ICASSP55912.2026.11461451. |
| Licencia: | Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0) |
| Aparece en las colecciones: | Publicaciones académicas y científicas - Instituto de Ingeniería Eléctrica |
Ficheros en este ítem:
| Fichero | Descripción | Tamaño | Formato | ||
|---|---|---|---|---|---|
| INPSR26.pdf | Preprint | 184,91 kB | Adobe PDF | Visualizar/Abrir |
Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons