english Icono del idioma   español Icono del idioma  

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12008/55115 Cómo citar
Registro completo de metadatos
Campo DC Valor Lengua/Idioma
dc.contributor.advisorPreciozzi, Javier-
dc.contributor.advisorFiori, Marcelo-
dc.contributor.authorTayler, Silvana-
dc.date.accessioned2026-05-20T17:27:30Z-
dc.date.available2026-05-20T17:27:30Z-
dc.date.issued2026-
dc.identifier.citationTayler, S. Optimization of data collection in facial recognition models through subsampling strategies [en línea] Tesis de maestría. Montevideo : Udelar. FI, 2026.es
dc.identifier.issn1688-2806-
dc.identifier.urihttps://hdl.handle.net/20.500.12008/55115-
dc.description.abstractFacial recognition systems have achieved remarkable performance in recent years; however, their accuracy remains highly dependent on the quality, diversity, and volume of training data. The widespread use of large-scale datasets, often collected without consent, raises significant ethical and legal concerns, while the storage and computational demands associated with such data present ongoing challenges. This thesis explores subsampling techniques to evaluate whether strategies can be identified that guide data collection, independently of the training process, with the goal of reducing data needs and improving computational efficiency. ArcFace, a state-of-the-art facial recognition model, was selected as the baseline architecture due to its strong feature discrimination and generalization capabilities. Using the MS1M-ArcFace dataset for training and LFW, AgeDB-30, and CFP-FP benchmarks for evaluation, 53 experiments were conducted. Multiple sampling ap- proaches were compared, at image and identity level, including uniform random selection, stratified sampling, k-means clustering and greedy Maximin selection. Both image and identity level subsampling were explored, with experiments designed to evaluate the effect of sample representativeness, intra- and inter-class variability, and the proportion of identities in the training set. Results indicate that k-means clustering applied to ArcFace embeddings at the image level achieved the highest overall performance across all benchmark datasets, demonstrating its effectiveness in reducing redundancy while preserving intra-class and inter-class diversity. Alternatively, random sampling at the identity level yields competitive performance compared to more complex strategies, particularly when high intra-class variability is desired. This finding suggests that identity-level random sam- pling is a valid and cost-effective approach for training data selection, significantly reducing the costs of data collection, storage, and processing. Additionally, k-means clustering may serve as a more suitable alternative in scenarios with a limited number of identities and where greater intra-class variability is not required. These insights are especially relevant in ethically constrained environments, where biometric data collection is restricted by consent. In all cases, clustering can further guide the final image selection process once consent is obtained, enhancing both the efficiency and representativeness of the dataset.es
dc.format.extent57 p.es
dc.format.mimetypeapplication/pdfes
dc.language.isoenes
dc.publisherUdelar.FI.es
dc.rightsLas obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)es
dc.titleOptimization of data collection in facial recognition models through subsampling strategieses
dc.typeTesis de maestríaes
dc.contributor.filiacionTayler Silvana, Universidad de la República (Uruguay). Facultad de Ingeniería.-
thesis.degree.grantorUniversidad de la República (Uruguay). Facultad de Ingenieríaes
thesis.degree.nameMagíster en Ingeniería Matemáticaes
dc.rights.licenceLicencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)es
Aparece en las colecciones: Tesis de Posgrado - Facultad de Ingeniería

Ficheros en este ítem:
Fichero Descripción Tamaño Formato   
Tay26.pdfTesis de Maestría7,08 MBAdobe PDFVisualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons