Mercoledì 3 Febbraio alle ore 13, presso l'aula Fanfani al 5° piano del Dipartimento MEMOTEF (Sapienza Università di Roma, via del Castro Laurenziano 9)
si terrà il Seminario: Data management with Statistics. Stop messing with the data!
Speaker: Eleonora Laurenza (Dipartimento di Scienze Statistiche, Sapienza Università di Roma e Banca d'Italia)
Abstract:
Data fusion is a major task in data integration. Frequently, different sources store data about the same real-world entities, however with conflicts in the values of their features. Data fusion aims at solving those conflicts in order to obtain a unique global view over those sources. Some solutions to the problem have been proposed in the database literature, yet they have a number of limitations for real cases: for example they leave too many alternatives to users or produce biased results. This paper proposes a novel algorithm for data fusion that uses the probabilistic dependencies among the features to actually solve the conflicts, overcoming the limitations of the existing approaches.