Por favor, use este identificador para citar o enlazar este ítem:
https://hdl.handle.net/20.500.12008/51249
Cómo citar
Registro completo de metadatos
Campo DC | Valor | Lengua/Idioma |
---|---|---|
dc.contributor.author | Acevedo, Emiliano | - |
dc.contributor.author | Rocamora, Martín | - |
dc.contributor.author | Fuentes, Magdalena | - |
dc.date.accessioned | 2025-08-22T17:49:49Z | - |
dc.date.available | 2025-08-22T17:49:49Z | - |
dc.date.issued | 2025 | - |
dc.identifier.citation | Acevedo, E., Rocamora, M. y Fuentes, M. Domain adaptation method and modality gap impact in audio-text models for prototypical sound classification [en línea]. EN: Interspeech 2025, Rotterdam, The Netherlands, 17-21 aug. 2025, pp. 1328-1332. DOI: 10.21437/Interspeech.2025-886. | es |
dc.identifier.uri | https://www.interspeech2025.org/home | - |
dc.identifier.uri | https://hdl.handle.net/20.500.12008/51249 | - |
dc.description.abstract | Audio-text models are widely used in zero-shot environmental sound classification as they alleviate the need for annotated data. However, we show that their performance severely drops in the presence of background sound sources. Our analysis reveals that this degradation is primarily driven by SNR levels of background soundscapes, and independent of background type. To address this, we propose a novel method that quantifies and integrates the contribution of background sources into the classification process, improving performance without requiring model retraining. Our domain adaptation technique enhances accuracy across various backgrounds and SNR conditions. Moreover, we analyze the modality gap between audio and text embeddings, showing that narrowing this gap improves classification performance. The method generalizes effectively across state-of-the-art prototypical approaches, showcasing its scalability and robustness for diverse environments. | es |
dc.description.uri | https://www.isca-archive.org/interspeech_2025/acevedo25_interspeech.html# | es |
dc.format.extent | 5 p. | es |
dc.format.mimetype | application/pdf | es |
dc.language.iso | en | es |
dc.publisher | ISCA - International Speech Communication Association. | es |
dc.relation.ispartof | Interspeech 2025, Rotterdam, The Netherlands, 17-21 aug. 2025, pp. 1328-1332. | es |
dc.rights | Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014) | es |
dc.subject | Audio-text models | es |
dc.subject | Modality gap | es |
dc.subject | Domain adaptation | es |
dc.subject | Zero-shot sound classification | es |
dc.title | Domain adaptation method and modality gap impact in audio-text models for prototypical sound classification. | es |
dc.type | Ponencia | es |
dc.contributor.filiacion | Acevedo Emiliano, Universidad de la República (Uruguay). Facultad de Ingeniería. | - |
dc.contributor.filiacion | Rocamora Martín, Universidad de la República (Uruguay). Facultad de Ingeniería. | - |
dc.contributor.filiacion | Fuentes Magdalena, New York University, USA | - |
dc.rights.licence | Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0) | es |
dc.identifier.doi | 10.21437/Interspeech.2025-886 | - |
udelar.academic.department | Procesamiento de Señales | es |
udelar.investigation.group | Procesamiento de Audio (GPA) | es |
Aparece en las colecciones: | Publicaciones académicas y científicas - Instituto de Ingeniería Eléctrica |
Ficheros en este ítem:
Fichero | Descripción | Tamaño | Formato | ||
---|---|---|---|---|---|
ARF25.pdf | Versión publicada | 405,38 kB | Adobe PDF | Visualizar/Abrir |
Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons