english Icono del idioma   español Icono del idioma  

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12008/29755 How to cite
Title: LETEO: Scalable anonymization of big data and its application to learning analytics
Authors: Giménez, Eduardo
Etcheverry, Lorena
Olmedo, Federico
Buil Aranda, Carlos
Toro, Matías
Pastorini, Marcos
Type: Reporte técnico
Keywords: Anonymization, Big data, Learning analytics
Geographic coverage: Uruguay.
Issue Date: 2021
Abstract: Created in 2007, Plan Ceibal is an inclusion and equal opportunities plan with the aim of supporting Uruguayan educational policies with technology. Throughout these years, and within the framework of its tasks, Ceibal has an important amount of data related to the use of technology in education, necessary to manage the plan and fulfill the assigned legal tasks. However, the data does not they can be studied without accounting for the problem of de identifying the users of the Plan. To exploit this data, Ceibal has deployed an instance of the Hortonworks Data Platform (HDP), a open source platform for the storage and parallel processing of massive data (big data). HDP offers a wide range of functional components ranging from large file storage (HDFS) to distributed programming of machine learning algorithms (Apache Spark / MLlib). However, as of today there are no solutions for the de-identification of personal code data open and integrated into the Hortonworks ecosystem. On the one hand, the deidentification tools existing data have not been designed so that they can easily scale to large volumes of data, and they also do not offer easy integration mechanisms with HDFS. This forces you to export the data outside of the platform that stores them to be able to anonymize them, with the consequent risk of exposure of confidential information. On the other hand, the few integrated solutions in the Hortonworks ecosystem are owners and the cost of their licenses is very significant. The objective of this project is to promote the use of the enormous amount of educational and technological data that Ceibal possesses, lifting one of the greatest obstacles that exist for that, namely, the preservation of privacy and the protection of the personal data of the beneficiaries of the Plan. To this end, this project seeks to generate anonymization tools that extend the HDP platform. On In particular, it seeks to develop open source modules to integrate into said platform, which implement a set of programmed anonymization techniques and algorithms in a distributed manner using Apache Spark and that can be applied to data sets stored in HDFS files.
Description: ANII Fondo sectorial de investigación con datos - 2018
Publisher: Udelar. FI.
Citation: Giménez, E., Etcheverry, L., Olmedo, F. y otros. LETEO: Scalable anonymization of big data and its application to learning analytics [en línea]. Montevideo : Udelar. FI.,2021.
License: Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
Appears in Collections:Reportes Técnicos - Instituto de Computación

Files in This Item:
File Description SizeFormat  
GEOBTP21.pdf785,02 kBAdobe PDFView/Open

This item is licensed under a Creative Commons License Creative Commons