Por favor, use este identificador para citar o enlazar este ítem:
https://hdl.handle.net/20.500.12008/51249
Cómo citar
Título: | Domain adaptation method and modality gap impact in audio-text models for prototypical sound classification. |
Autor: | Acevedo, Emiliano Rocamora, Martín Fuentes, Magdalena |
Tipo: | Ponencia |
Palabras clave: | Audio-text models, Modality gap, Domain adaptation, Zero-shot sound classification |
Fecha de publicación: | 2025 |
Resumen: | Audio-text models are widely used in zero-shot environmental sound classification as they alleviate the need for annotated data. However, we show that their performance severely drops in the presence of background sound sources. Our analysis reveals that this degradation is primarily driven by SNR levels of background soundscapes, and independent of background type. To address this, we propose a novel method that quantifies and integrates the contribution of background sources into the classification process, improving performance without requiring model retraining. Our domain adaptation technique enhances accuracy across various backgrounds and SNR conditions. Moreover, we analyze the modality gap between audio and text embeddings, showing that narrowing this gap improves classification performance. The method generalizes effectively across state-of-the-art prototypical approaches, showcasing its scalability and robustness for diverse environments. |
Enlace: | https://www.isca-archive.org/interspeech_2025/acevedo25_interspeech.html# |
Editorial: | ISCA - International Speech Communication Association. |
EN: | Interspeech 2025, Rotterdam, The Netherlands, 17-21 aug. 2025, pp. 1328-1332. |
Citación: | Acevedo, E., Rocamora, M. y Fuentes, M. Domain adaptation method and modality gap impact in audio-text models for prototypical sound classification [en línea]. EN: Interspeech 2025, Rotterdam, The Netherlands, 17-21 aug. 2025, pp. 1328-1332. DOI: 10.21437/Interspeech.2025-886. |
Departamento académico: | Procesamiento de Señales |
Grupo de investigación: | Procesamiento de Audio (GPA) |
Licencia: | Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0) |
Aparece en las colecciones: | Publicaciones académicas y científicas - Instituto de Ingeniería Eléctrica |
Ficheros en este ítem:
Fichero | Descripción | Tamaño | Formato | ||
---|---|---|---|---|---|
ARF25.pdf | Versión publicada | 405,38 kB | Adobe PDF | Visualizar/Abrir |
Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons