WannaDB: Ad-hoc SQL Queries over Text Collections
Loading...
Fulltext URI
Document type
Text/Conference Paper
Files
Additional Information
Date
2023
Journal Title
Journal ISSN
Volume Title
Source
Publisher
Gesellschaft für Informatik e.V.
Abstract
n this paper, we propose a new system called WannaDB that allows users to interactively perform structured explorations of text collections in an ad-hoc manner. Extracting structured data from text is a classical problem where a plenitude of approaches and even industry-scale systems already exists. However, these approaches lack in the ability to support the ad-hoc exploration of texts using structured queries. The main idea of WannaDB is to include user interaction to support ad-hoc SQL queries over text collections using a new two-phased approach. First, a superset of information nuggets from the texts is extracted using existing extractors such as named entity recognizers. Then, the extractions are interactively matched to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that WannaDB is thus able to extract structured data from a broad range of (real-world) text collections in high quality without the need to design extraction pipelines upfront.
Description
Keywords
interactive text exploration, text to table, matching embeddings
Citation
Endorsement
Review
Supplemented By
Referenced By
Show citations