P331 - BTW2023- Datenbanksysteme für Business, Technologie und Web

Permanent URI for this collectionhttps://dl.gi.de/handle/20.500.12116/40312

Authors with most Documents  

Browse

Search Results

1 - 6 of 6
  • Conference Paper
    MLProvCodeGen: A Tool for Provenance Data Input and Capture of Customizable Machine Learning Scripts
    (Gesellschaft für Informatik e.V., 2023) Mustafa, Tarek Al; König-Ries, Birgitta; Samuel, Sheeba; König-Ries, Birgitta; Scherzinger, Stefanie; Lehner, Wolfgang; Vossen, Gottfried
    Over the last decade Machine learning (ML) has dramatically changed the application ofand research in computer science. It becomes increasingly complicated to assure the transparency and reproducibility of advanced ML systems from raw data to deployment. In this paper, we describe an approach to supply users with an interface to specify a variety of parameters that together provide complete provenance information and automatically generate executable ML code from this information. We introduce MLProvCodeGen (Machine Learning Provenance Code Generator), a JupyterLab extension to generate custom code for ML experiments from user-defined metadata. ML workflows can be generated with different data settings, model parameters, methods, and trainingparameters and reproduce results in Jupyter Notebooks. We evaluated our approach with two ML applications, image and multiclass classification, and conducted a user evaluation.
  • Conference Paper
    MLProvLab: Provenance Management for Data Science Notebooks
    (Gesellschaft für Informatik e.V., 2023) Kerzel, Dominik; König-Ries, Birgitta; Sheeba, Samuel; König-Ries, Birgitta; Scherzinger, Stefanie; Lehner, Wolfgang; Vossen, Gottfried
    Computational notebooks are a form of computational narrative fostering reproducibility.They provide an interactive computing environment where users can run and modify code, and repeat the exploration, providing an iterative communication between data scientists and code. While the ability to execute notebooks non-linearly benefits data scientists for exploration, the drawback is, that it is possible to lose control over the datasets, variables, and methods defined in the notebook and their dependencies.Thus, in this process of user interaction and exploration, there can be a loss of execution history information. To prevent this, a possibility is needed to maintain provenance information. Provenance plays a significant role in data science, especially facilitating the reproducibility of results.To this end, we developed a provenance management tool to help data scientists track, capture, compare, and visualize provenance information in notebook code environments.We conducted an evaluation with data scientists, where participants were asked to find specific provenance information from the execution history of a machine learning Jupyter notebook.The results from the performance and user evaluation show promising aspects of provenance management features of the tool.The resulting system, MLProvLab, is available as an open-source extension for JupyterLab.
  • Conference Paper
    A Provenance Management Framework for Knowledge Graph Generation in a Web Portal
    (Gesellschaft für Informatik e.V., 2023) Kleinsteuber, Erik; Babalou, Samira; König-Ries, Birgitta; König-Ries, Birgitta; Scherzinger, Stefanie; Lehner, Wolfgang; Vossen, Gottfried
    Knowledge Graphs (KGs) are the semantic backbone for a wide variety of applications in different domains. In recent years, different web portals providing relevant functionalities for managing KGs have been proposed. An important functionality such portals is provenance data management of KG generation process. Capturing, storing, and accessing provenance data efficiently are complex problems. Solutions to these problems vary widely depending on many factors like the computational environment, computational methods, desired provenance granularity, and much more. In this paper, we present one possible solution: a new framework to capture coarse-grained workflow provenance of KGs during creation in a web portal. We capture the necessary information of the KG generation process; store and retrieve the provenance data using standard functionalities of relational databases. Our captured workflow can be rerun over the same or different input source data. With this, the framework can support four different applications of provenance data: (i) reproduce the KG, (ii) create a new KG with an existing workflow, (iii) undo the executed tools and adapt the provenance data accordingly, and (iv) retrieve the provenance data of a KG.
  • Conference Paper
    A Core Ontology to Support Agricultural Data Interoperability
    (Gesellschaft für Informatik e.V., 2023) Abdelmageed, Aly; Hatem, Shahenda; ael, Tasneem; Medhat, Walaa; König-Ries, Birgitta; Ellakwa, Susan; Elkafrawy, Passent; Algergawy, Alsayed; König-Ries, Birgitta; Scherzinger, Stefanie; Lehner, Wolfgang; Vossen, Gottfried
    The amount and variety of raw data generated in the agriculture sector from numeroussources, including soil sensors and local weather stations, are proliferating. However, these raw data in themselves are meaningless and isolated and, therefore, may offer little value to the farmer. Data usefulness is determined by its context and meaning and by how it is interoperable with data from other sources. Semantic web technology can provide context and meaning to data and its aggregation by providing standard data interchange formats and description languages. In this paper, we introduce the design and overall description of a core ontology that facilitates the process of data interoperability in the agricultural domain.
  • Conference Paper
    Semantic Search for Biological Datasets: A Usability Study on Modes of Querying and Explaining Search Results
    (Gesellschaft für Informatik e.V., 2023) Löffler, Felicitas; Shafiei, Fateme; Witte, René; König-Ries, Birgitta; Klan, Friederike; König-Ries, Birgitta; Scherzinger, Stefanie; Lehner, Wolfgang; Vossen, Gottfried
    Dataset discovery is a frequent task in daily research practice, yet studies are missing that explore the usability of user interfaces (UI) in data portals. In particular, very few user studies exist that analyze whether particular elements in the user interface are useful for search tasks. We aim to address those needs for more specific usability evaluations in dataset search. In this work, wepresent a flexible semantic search over biological datasets with two user interfaces. The search result contains semantically related terms, such as synonyms or more specific terms, obtained from domain ontologies. We evaluated the system in a user study with 20 scholars. We focused on two components, the query input to explore a search in categories (entity types) in comparision to a single input field, and we analyzed textual highlightings in the returned datasets to study whether users are distracted by semantic information such as URIs. Our results show that users prefer interfaces with a single input field for search tasks they are not familiar with, and that users appreciate explanations with terminologies and URIs.
  • Conference Paper
    Fourth Workshop on Big (and Small) Data in Science and Humanities (BigDS)
    (Gesellschaft für Informatik e.V., 2023) Henrich, Andreas; Karam, Naouel; König-Ries, Birgitta; Seeger, Bernhard; König-Ries, Birgitta; Scherzinger, Stefanie; Lehner, Wolfgang; Vossen, Gottfried
    The Workshop on Big (and Small) Data in Science and Humanities addresses all issues related to the management and analysis of data in science from the perspective of different application areas. The goal is the scientific and interdisciplinary exchange on these topics. In the joint discussion, current and future challenges in the processing and analysis of data and possible future technologies are to be identified. This year's edition will emphasize the topics of FAIR Data and NFDI.
Load citations