VisOnFire

Visual Analysis of Large and Heterogeneous Scientific Workflows for Analytical Provenance

Over the last few decades, many scientific fields such as biomedicine or climate research have been confronted with vast and continuously growing amounts of data. However, not data gathering is the grand challenge anymore but its analysis. Both the sheer amount of data and its complexity pose significant problems. Today, local solutions are not feasible anymore and large-scale experiments are carried out on powerful server infrastructure as scientific workflows consisting of data transformation and analysis operations. Running such workflows can take hours, days, or even weeks. Misconfiguration, erroneous scripts, and non-converging operations are highly problematic in this respect, as re-running the workflow is costly both in time and money. Moreover, these workflows are created, administered, and changed by potentially large and spatially separated consortia of involved researchers. Due to this complexity, it becomes increasingly hard to gain an overview of all processing steps involved and to trace who has changed what at which place and caused which changes in (intermediate) results. In many contexts, reproducibility of results generated by complex scientific workflows is crucial. However, a recent study showed that it was not possible to confirm findings of almost 90% of over 50 cancer genomics studies. Thus, developing novel approaches that realize traceability and reproducibility is of utmost importance.

The key to traceability and reproducibility lies in the collection of information about the processed data, the applied operations, and their parameters over time. Modern scientific workflow tools provide analytical provenance, but are mostly restricted to scenarios where a single static input dataset results in a single output dataset. With changes occurring at the level of the input data, the workflow itself, and also its parameterization, it is hard and tedious—if even possible—to find out which changes actually caused variations in the output using current technology.

The primary goal of our project is to realize provenance at all levels, allowing analysts to gain a deeper understanding of the workflow, changes applied to it, and how they influence the results. This will be achieved by developing a visual forensic tool for scientific workflows, which includes novel visual analysis methods that allow for a scalable visualization of the workflow and its changes, a visual comparison of complex data structures, and novel change metrics needed to quantify changes in complex data structures.

The methods we develop will help address the issue of reproducibility in published results, which has plagued many scientific communities. Investigators can use our methods to make all or parts of it public, traceable, and reproducible. The provenance visualization and query tools will make it straightforward for scientists to offer a comprehensive description of the analyses performed to obtain their results.

Journal Publications

  • Christina Niederer, Holger Stitz, Reem Hourieh, Florian Grassinger, Wolfgang Aigner, Marc Streit
    TACO: Visualizing Changes in Tables Over Time
    IEEE Transactions on Visualization and Computer Graphics (InfoVis '17), 2017 (to appear).
    Paper Homepage
  • Holger Stitz, Stefan Luger, Nils Gehlenborg, and Marc Streit
    AVOCADO: Visualization of Workflow-Derived Data Provenance for Reproducible Biomedical Research
    Computer Graphics Forum (EuroVis '16), vol. 35, no. 3, pp. 481-490, 2016
    Paper Homepage
  • Holger Stitz, Samuel Gratzl, Wolfgang Aigner, and Marc Streit
    ThermalPlot: Visualizing Multi-Attribute Time-Series Data Using a Thermal Metaphor
    IEEE Transactions on Visualization and Computer Graphics, 22(12), pp. 2594-2607, 2016
    Paper Homepage

Extended Abstracts and Posters

  • Holger Stitz, Samuel Gratzl, Harald Piringer, and Marc Streit
    Provenance-Based Visualization Retrieval
    IEEE Conference on Visual Analytics Science and Technology (VAST ’17), Phoenix, AZ, USA, 2017
     IEEE VAST 2017 Best Poster Award Poster Abstract | Poster | Video
  • Reem Hourieh, Holger Stitz, Nils Gehlenborg, Marc Streit
    TaCo: Comparative Visualization of Large Tabular Data
    Poster Compendium of the Eurographics/IEEE Symposium on Visualization (EuroVis ’16), Groningen, Netherlands, 2016.
    Poster Abstract | Poster | Video
  • Stefan Luger, Holger Stitz, Nils Gehlenborg, and Marc Streit
    Interactive Visualization of Provenance Graphs for Reproducible Biomedical Research
    IEEE Conference on Information Visualization (InfoVis ’15), Chicago, IL, USA, 2015
     IEEE InfoVis 2015 Best Poster Award Poster Abstract | Poster
  • Holger Stitz, Samuel Gratzl, Wolfgang Aigner, and Marc Streit
    ThermalPlot: Visualizing Multi-Attribute Time-Series Data Using a Thermal Metaphor
    IEEE Conference on Information Visualization (InfoVis ’15), Chicago, IL, USA, 2015
     IEEE InfoVis 2015 Honorable Mention Poster Award Poster Homepage
  • FH St. Pölten
    VisOnFire: Visual Analysis of Large and Heterogeneous Scientific Workflows for Analytical Provenance
    European Researchers Night, 2015
    Poster

Theses

  • Michael Gillhofer | Master's Thesis
    Provenance Graph Based Steering
    Supervision: Prof. Marc Streit
  • Reem Hourieh | Master's Thesis
    Comparative Visualization of Large Tabular Data
    Supervision: Prof. Marc Streit
  • Stefan Luger | Master's Thesis
    Interactive Visualization of Provenance Graphs for Reproducible Biomedical Research
    Supervision: Prof. Marc Streit

You can contact us, via marc.streit@jku or wolfgang.aigner@fhstp.ac.at.

Austrian Science Fund (FWF) logo

The project (P 27975-NBL) is funded by the Austrian Science Fund (FWF).