Explainable Data Matching: Selecting Representative Pairs with Active Learning Pair-Selection Strategies
Loading...
Fulltext URI
Document type
Text/Conference Paper
Files
Additional Information
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Source
Publisher
Gesellschaft für Informatik e.V.
Abstract
In both research and enterprise, dirty data poses numerous challenges. Many data cleaning pipelines include a data deduplication step that detects and removes entries within a given dataset which refer to the same real-world entity. Throughout the development of such deduplication techniques, data scientists have to make sense of the large result sets that their matching solutions generate to quickly identify changes in behavior or to discover opportunities for improvements. We propose an approach that aims to select a small subset of pairs from the result set of a data matching solution which is representative of the matching solution’s overall behavior. To evaluate our approach, we show that the performance of a matching solution trained on pairs selected according to our strategy outperforms a randomly selected subset of pairs.
Description
Keywords
Entity Resolution, Data Matching, ExplainableDM, Pair Selection, Benchmark
Citation
Endorsement
Review
Supplemented By
Referenced By
Show citations