english Icono del idioma   español Icono del idioma  

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12008/51249 Cómo citar
Título: Domain adaptation method and modality gap impact in audio-text models for prototypical sound classification.
Autor: Acevedo, Emiliano
Rocamora, Martín
Fuentes, Magdalena
Tipo: Ponencia
Palabras clave: Audio-text models, Modality gap, Domain adaptation, Zero-shot sound classification
Fecha de publicación: 2025
Resumen: Audio-text models are widely used in zero-shot environmental sound classification as they alleviate the need for annotated data. However, we show that their performance severely drops in the presence of background sound sources. Our analysis reveals that this degradation is primarily driven by SNR levels of background soundscapes, and independent of background type. To address this, we propose a novel method that quantifies and integrates the contribution of background sources into the classification process, improving performance without requiring model retraining. Our domain adaptation technique enhances accuracy across various backgrounds and SNR conditions. Moreover, we analyze the modality gap between audio and text embeddings, showing that narrowing this gap improves classification performance. The method generalizes effectively across state-of-the-art prototypical approaches, showcasing its scalability and robustness for diverse environments.
Enlace: https://www.isca-archive.org/interspeech_2025/acevedo25_interspeech.html#
Editorial: ISCA - International Speech Communication Association.
EN: Interspeech 2025, Rotterdam, The Netherlands, 17-21 aug. 2025, pp. 1328-1332.
Citación: Acevedo, E., Rocamora, M. y Fuentes, M. Domain adaptation method and modality gap impact in audio-text models for prototypical sound classification [en línea]. EN: Interspeech 2025, Rotterdam, The Netherlands, 17-21 aug. 2025, pp. 1328-1332. DOI: 10.21437/Interspeech.2025-886.
Departamento académico: Procesamiento de Señales
Grupo de investigación: Procesamiento de Audio (GPA)
Licencia: Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
Aparece en las colecciones: Publicaciones académicas y científicas - Instituto de Ingeniería Eléctrica

Ficheros en este ítem:
Fichero Descripción Tamaño Formato   
ARF25.pdfVersión publicada405,38 kBAdobe PDFVisualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons