Por favor, use este identificador para citar o enlazar este ítem:
https://hdl.handle.net/20.500.12008/55115
Cómo citar
Registro completo de metadatos
| Campo DC | Valor | Lengua/Idioma |
|---|---|---|
| dc.contributor.advisor | Preciozzi, Javier | - |
| dc.contributor.advisor | Fiori, Marcelo | - |
| dc.contributor.author | Tayler, Silvana | - |
| dc.date.accessioned | 2026-05-20T17:27:30Z | - |
| dc.date.available | 2026-05-20T17:27:30Z | - |
| dc.date.issued | 2026 | - |
| dc.identifier.citation | Tayler, S. Optimization of data collection in facial recognition models through subsampling strategies [en línea] Tesis de maestría. Montevideo : Udelar. FI, 2026. | es |
| dc.identifier.issn | 1688-2806 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.12008/55115 | - |
| dc.description.abstract | Facial recognition systems have achieved remarkable performance in recent years; however, their accuracy remains highly dependent on the quality, diversity, and volume of training data. The widespread use of large-scale datasets, often collected without consent, raises significant ethical and legal concerns, while the storage and computational demands associated with such data present ongoing challenges. This thesis explores subsampling techniques to evaluate whether strategies can be identified that guide data collection, independently of the training process, with the goal of reducing data needs and improving computational efficiency. ArcFace, a state-of-the-art facial recognition model, was selected as the baseline architecture due to its strong feature discrimination and generalization capabilities. Using the MS1M-ArcFace dataset for training and LFW, AgeDB-30, and CFP-FP benchmarks for evaluation, 53 experiments were conducted. Multiple sampling ap- proaches were compared, at image and identity level, including uniform random selection, stratified sampling, k-means clustering and greedy Maximin selection. Both image and identity level subsampling were explored, with experiments designed to evaluate the effect of sample representativeness, intra- and inter-class variability, and the proportion of identities in the training set. Results indicate that k-means clustering applied to ArcFace embeddings at the image level achieved the highest overall performance across all benchmark datasets, demonstrating its effectiveness in reducing redundancy while preserving intra-class and inter-class diversity. Alternatively, random sampling at the identity level yields competitive performance compared to more complex strategies, particularly when high intra-class variability is desired. This finding suggests that identity-level random sam- pling is a valid and cost-effective approach for training data selection, significantly reducing the costs of data collection, storage, and processing. Additionally, k-means clustering may serve as a more suitable alternative in scenarios with a limited number of identities and where greater intra-class variability is not required. These insights are especially relevant in ethically constrained environments, where biometric data collection is restricted by consent. In all cases, clustering can further guide the final image selection process once consent is obtained, enhancing both the efficiency and representativeness of the dataset. | es |
| dc.format.extent | 57 p. | es |
| dc.format.mimetype | application/pdf | es |
| dc.language.iso | en | es |
| dc.publisher | Udelar.FI. | es |
| dc.rights | Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014) | es |
| dc.title | Optimization of data collection in facial recognition models through subsampling strategies | es |
| dc.type | Tesis de maestría | es |
| dc.contributor.filiacion | Tayler Silvana, Universidad de la República (Uruguay). Facultad de Ingeniería. | - |
| thesis.degree.grantor | Universidad de la República (Uruguay). Facultad de Ingeniería | es |
| thesis.degree.name | Magíster en Ingeniería Matemática | es |
| dc.rights.licence | Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0) | es |
| Aparece en las colecciones: | Tesis de Posgrado - Facultad de Ingeniería | |
Ficheros en este ítem:
| Fichero | Descripción | Tamaño | Formato | ||
|---|---|---|---|---|---|
| Tay26.pdf | Tesis de Maestría | 7,08 MB | Adobe PDF | Visualizar/Abrir |
Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons