Giovedì 23 febbraio, alle ore 12:30, presso l'Aula di Matematica (Sapienza Università di Roma, Via del Castro Laurenziano 9, Roma - Facoltà di Economia, 1° piano)
Relatore: Li-Chun Zhang (University of Southampton and Statistics Norway)
Titolo del seminario: "On secondary analysis of datasets that cannot be linked without errors".
Abstract:
Unless a unique identifier exists for this purpose, probability linkage of the datasets will generate linkage errors that can cause bias of the subsequent analysis, if the linked data are treated as if they were observed truly. We adopt the perspective of secondary analysts, who we assume to neither have full access to the linkage key variables nor the details or tools of the actual linkage procedure, but are only provided with some non-disclosive linkage comparison data about the record linkage precision or how the records compare to each other. Three different approaches are investigated for secondary analysis. First, it is shown that maximum likelihood estimation may be biased and inconsistent, so that likelihood inference under linkage error models that generate the linked data is not viable in general. Next, using linear regression as the case-in-point, an approach to unbiased regression estimation is devised, which does not require actually linking the separate datasets. Thirdly, the conditions by which valid analysis can be obtained based on a subset of the all links that otherwise could have been made is investigated. This can be useful for analysis of large population datasets such as that arising from the Census data linkage project at ONS, where the required computation is infeasible if one attempts to use all the linkable and unlinkable records.