english Icono del idioma   español Icono del idioma  

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12008/34296 How cite
Title: Data quality maintenance in Data Integration Systems
Authors: Marotta, Adriana
Obtained title: Doctor en Informática
University or service that grants the title: Universidad de la República (Uruguay). Facultad de Ingeniería
Tutor: Ruggia, Raúl
Type: Tesis de doctorado
Issue Date: 2008
Abstract: A Data Integration System (DIS) is an information system that integrates data from a set of heterogeneous and autonomous information sources and provides it to users. Quality in these systems consists of various factors that are measured in data. Some of the usually considered ones are completeness, accuracy, accessibility, freshness, availability. In a DIS, quality factors are associated to the sources, to the extracted and transformed information, and to the information provided by the DIS to the user. At the same time, the user has the possibility of posing quality requirements associated to his data requirements. DIS Quality is considered as better, the nearer it is to the user quality requirements. DIS quality depends on data sources quality, on data transformations and on quality required by users. Therefore, DIS quality is a property that varies in function of the variations of these three other properties. The general goal of this thesis is to provide mechanisms for maintaining DIS quality at a level that satisfies the user quality requirements, minimizing the modifications to the system that are generated by quality changes. The proposal of this thesis allows constructing and maintaining a DIS that is tolerant to quality changes. This means that the DIS is constructed taking into account previsions of quality behavior, such that if changes occur according to these previsions the system is not affected at all by them. These previsions are provided by models of quality behavior of DIS data, which must be maintained up to date. With this strategy, the DIS is affected only when quality behavior models change, instead of being affected each time there is a quality variation in the system. The thesis has a probabilistic approach, which allows modeling the behavior of the quality factors at the sources and at the DIS, allows the users to state flexible quality requirements (using probabilities), and provides tools, such as certainty, mathematical expectation, etc., that help to decide which quality changes are relevant to the DIS quality. The probabilistic models are monitored in order to detect source quality changes, strategy that allows detecting changes on quality behavior and not only punctual quality changes. We propose to monitor also other DIS properties that affect its quality, and for each of these changes decide if they affect the behavior of DIS quality, taking into account DIS quality models. Finally, the probabilistic approach is also applied at the moment of determining actions to take in order to improve DIS quality. For the interpretation of DIS situation we propose to use statistics, which include, in particular, the history of the quality models.
Publisher: Udelar. FI
ISSN: 1688-2776
Citation: Marotta, A. Data quality maintenance in Data Integration Systems [en línea] Tesis de doctorado. Montevideo : Udelar. FI. INCO : PEDECIBA, 2008.
License: Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
Appears in Collections:Tesis de posgrado - Instituto de Computación

Files in This Item:
File Description SizeFormat  
Mar08.pdfTesis de Doctorado2,11 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons