UpSet: Visualizing Intersecting Sets

Interactive set visualization for more than three sets.

Understanding relationships between sets is an important analysis task. The major challenge in this context is the combinatorial explosion of the number of set intersections if the number of sets exceeds a trivial threshold. To address this, we introduce UpSet, a novel visualization technique for the quantitative analysis of sets, their intersections, and aggregates of intersections.

UpSet Screenshot

UpSet is focused on creating task-driven aggregates, communicating the size and properties of aggregates and intersections, and a duality between the visualization of the elements in a dataset and their set membership. UpSet visualizes set intersections in a matrix layout. The matrix layout enables the effective representation of associated data, such as the number of elements in the aggregates and intersections, as well as additional summary statistics.

Sorting according to various measures enables a task-driven analysis of relevant intersections and aggregates. The elements represented in the sets and their associated attributes are visualized in a separate view. Queries based on containment in specific intersections, aggregates, or driven by attribute filters are propagated between both views. UpSet also introduces several advanced visual encodings and interaction methods to overcome the problems of varying scales and to address scalability.

To get an idea of what UpSet is about, you can watch this 30-second video:

  Download video

Why UpSet?

See this related commentary: Points of view: Sets and intersections. Alexander Lex, Nils Gehlenborg. Nature Methods, vol. 11, no. 8, pp. 779, 2014.

Venn diagrams are a horrible way to visualize intersections of more than three or four sets. The figure below shows an example of a six-set venn diagram published in Nature that shows the relationship between the banana’s genome and the genome of five other species.

UpSet Screenshot

While this figure looks fun and generated quite a bit of hype is is also a terrible visualization. Try to extract any information from it. It’s really hard to trace which intersection involves which sets. It’s not obvious which is the biggest intersection from the visualization - you have to read the labels one by one. This is, unfortunately, not an isolated example, but this particular Venn diagram triggered us to develop UpSet.

UpSet has three guiding principles:

You might ask, how does the banana venn diagram look in UpSet? Here you go: UpSet Screenshot

(This figure was created with the UpSet R version.) Granted, that’s a little hard to read because the figure is rather small. But we can simply remove the small intersections, and we get a nice plot which shows us the main features of the data:

UpSet Screenshot

Notice how easy it is to see trends: the vast majority of genes is shared between all plants, the first three species seem to be highly related, while the fifth species (Phoenix dactylifera) is most different from the others.

UpSet concept

UpSet plots the intersections of a set as a matrix, as shown in the figure on the right. Each column corresponds to a set, and each row corresponds to one segment in a Venn diagram, as indicated in the figure. Cells are either empty (light-gray), indicating that this set is not part of that intersection, or filled, showing that the set is participating in the intersection. The first row in the figure is completely empty - it corresponds to all the elements that are in none of the sets, the second row corresponds to the elements that are only in the set A, (not in B or C), etc.

This layout is great, because we can plot the size of the intersections as bar charts right next ot the matrix, as you can see in the simple example on the left. This figure shows a Simpsons dataset in UpSet and in a corresponding Venn diagram. We can also sort in many different ways. Here we sort by the degree, i.e., by the number of sets that contribute to an intersection, but we can also dynamically sort by intersection size and other attributes.

Aggregation

In many cases, analysts are interested in understanding more complex set relationships than just individual intersections. UpSet addresses this by making use of aggregations. Aggregations summarize multiple intersection according to a specific pattern. The figure on the right shows an aggregation by sets. Note the extra row labeled “A” - it summarizes all of the intersections where A participates, as shown in the corresponding Venn diagram. These aggregations can show data just the same way as individual intersections can, but they can be collapsed to show only the aggregate, as is the case for B and C in the figure.

UpSet supports various types of aggregation. The figure on the left, for example, aggregates the Simpsons dataset by degree, but aggregation by sets, pairwise aggregation, and nested aggregation is also possible.

Queries

A concept closely related to aggregation is querying: UpSet allows users to define a group of intersections that must, may, or must not include a specific set. The query in the following picture defines a subset of Simpsons characters that are either exclusively male or that have blue hair and aren’t male. The first part of the query (first row) is indicated by two empty circles in the evil and blue hair cells. This part is combined as an “or” with the second part, that is set to “must” for blue hair, “may” for evil and “must not” for male.

Query Screenshot

UpSet can also query based on attributes. For example, you could define a query that only includes all Simpsons characters that are older than 18 years.

Visualizing Attributes

UpSet visualizes numerical attributes of the intersections and aggregates as boxplots in line with the matrix rows (see image below). Additional attributes can be visualized for selections in the Element View, for example, in scatterplots or histograms. The figure below shows two queries, a violet and a green one. The green query is active (see the green overlays on the bars, the green table header and the green dots in the scatterplot). The violet query is evident in the scatterplot and is indicated with triangles on the bars.

UpSet Screenshot

The elements of the active selection are shown in a scrollable table.

More Information

For more details on the concept please refer to the paper on UpSet or watch this video introducing the user interface:

  Download video

In summary, if you want to visualize intersections of two or three sets - use a Venn diagram, everyone knows them. For anything above three (and below ~40) sets - use UpSet!

UpSetR - Creating UpSet plots in R

Many scientists use R as part of their analysis workflow. To allow those analysts to easily produce high-resolution figures of set intersections within their workflow that can be used in publications, we have developed an R version of UpSet.

UpSet Screenshot

UpSetR has many of the features of our interactive UpSet plots, specifically it comes with various ways to sort and filter intersections and can plot attributes about the elements in the various sets. The layout is slightly adapted - intersections are plotted horizontally instead of vertically, which is beneficial for the typical aspect ratios found in papers. UpSetR does not include the aggregation features of UpSet, does not provide summary statistics about the intersections in line with the set cardinality, and does not provide access to the individual items.

To learn more about UpSetR visit the source code repository which includes documentation on usage, or check out the released versions on CRAN, or try the UpSetR shiny app.

pyUpSet - Creating UpSet plots in Python

pyUpSet has a similar use case to UpSetR but is developed for Python. While UpSetR is directly influenced by Caleydo team members, pyUpSet is developed independently, yet we appreciate the port. pyUpSet is available on github.

Frequently Asked Questions

Contact

If you have any questions, please e-mail us. If you found a bug, you can directly report it at the GitHub project site.

Acknowledgements

We wish to thank our collaborators, Anne Mai Wassermann, Soohyun Lee, Michele Coscia and Frank Neffke for their time and expertise. We also thank Bilal Alsallakh, Silvia Miksch and the whole Radial Sets team for providing feedback and datasets.

Explore other set visualization techniques at http://setviz.net/

UpSet is supported in part by the Austrian Science Fund (J 3437-N15), the Air Force Research Laboratory and DARPA grant FA8750-12-C-0300 and the United States NIH/National Human Genome Research Institute (K99 HG007583).

UpSet uses the D3 library for visualization. The music in the preview video is by Roulet, “I Can Make This”, licensed under creative commons.

Team Members

Software

Help

Publications