Por favor, use este identificador para citar o enlazar este ítem:
https://hdl.handle.net/20.500.12008/31397
Cómo citar
| Título: | Urban sound & sight : Dataset and benchmark for audio-visual urban scene understanding |
| Autor: | Fuentes, Magdalena Steers, Bea Zinemanas, Pablo Rocamora, Martín Bondi, Luca Wilkins, Julia Shi, Qianyi Hou, Yao Das, Samarjit Serra, Xavier Bello, Juan Pablo |
| Tipo: | Ponencia |
| Palabras clave: | Location awareness, Training, Industries, Annotations, Conferences, Signal processing, Benchmark testing, Audio-visual, Urban research, Traffic, Dataset |
| Fecha de publicación: | 2022 |
| Resumen: | Automatic audio-visual urban traffic understanding is a growing area of research with many potential applications of value to industry, academia, and the public sector. Yet, the lack of well-curated resources for training and evaluating models to research in this area hinders their development. To address this we present a curated audio-visual dataset, Urban Sound & Sight (Urbansas), developed for investigating the detection and localization of sounding vehicles in the wild. Urbansas consists of 12 hours of unlabeled data along with 3 hours of manually annotated data, including bounding boxes with classes and unique id of vehicles, and strong audio labels featuring vehicle types and indicating off-screen sounds. We discuss the challenges presented by the dataset and how to use its annotations for the localization of vehicles in the wild through audio models. |
| Editorial: | IEEE |
| EN: | ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 may 2022, pp. 141-145. |
| Citación: | Fuentes, M., Steers, B., Zinemanas, P. y otros. Urban sound & sight : Dataset and benchmark for audio-visual urban scene understanding [en línea]. EN: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 may, pp 141-145. Piscataway, NJ : IEEE, 2022. DOI 10.1109/ICASSP43922.2022.9747644 |
| Departamento académico: | Procesamiento de Señales |
| Grupo de investigación: | Procesamiento de Audio |
| Licencia: | Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0) |
| Aparece en las colecciones: | Publicaciones académicas y científicas - Instituto de Ingeniería Eléctrica |
Ficheros en este ítem:
| Fichero | Descripción | Tamaño | Formato | ||
|---|---|---|---|---|---|
| FSZRBWSHDSB22.pdf | Camera-Ready | 5,55 MB | Adobe PDF | Visualizar/Abrir |
Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons