banner Image courtesy of Kirk Goldsberry

Videos for virtual sessions available at

Conference registration at

How to present virtually

Monday, October 26, 2020 (Time Zone: Mountain Daylight Time UTC-6)
Session 1: Keynote and Three Papers
Finale Doshi-Velez
Finale Doshi-Velez
Keynote: Interpretability and Human Validation of Machine Learning
Finale Doshi-Velez, Harvard University
Abstract: As machine learning systems become ubiquitous, there is a growing interest in interpretable machine learning -- that is, systems that can provide human-interpretable rationale for their predictions and decisions. In this talk, I'll first give examples of why interpretability is needed in some of our work in machine learning for health, discussing how human input (which would be impossible without interpretability) is crucial for getting past fundamental limits of statistical validation. Next, I'll speak about some of the work we are doing to understand interpretability more broadly: what exactly is interpretability, and how can we assess it? By formalizing these notions, we can hope to identify universals of interpretability and also rigorously compare different kinds of systems for producing algorithmic explanations. Includes joint work with Been Kim, Andrew Ross, Mike Wu, Michael Hughes, Menaka Narayanan, Sam Gershman, Emily Chen, Jeffrey He, Isaac Lage, Roy Perlis, Tom McCoy, Gabe Hope, Leah Weiner, Erik Sudderth, Sonali Parbhoo, Marzyeh Ghassemi, Pete Szolovits, Mornin Feng, Leo Celi, Nicole Brimmer, Tristan Naumann, Rohit Joshi, Anna Rumshisky, Omer Gottesman, Emma Brunskill, Yao Liu, Sonali Parbhoo, Joe Futoma, and the Berkman Klein Center.

Bio: Finale Doshi-Velez is a John L. Loeb associate professor in Computer Science at the Harvard Paulson School of Engineering and Applied Sciences. She completed her MSc from the University of Cambridge as a Marshall Scholar, her PhD from MIT, and her postdoc at Harvard Medical School. Her interests lie at the intersection of machine learning, healthcare, and interpretablity.
Passing the Data Baton : A Retrospective Analysis on Data Science Work and Workers
Anamaria Crisan, Britta Fiore-Gartland, Melanie Tory
Abstract: Data science is a rapidly growing discipline and organizations increasingly depend on data science work. Yet the ambiguity around data science, what it is, and who data scientists are can make it difficult for visualization researchers to identify impactful research trajectories. We have conducted a retrospective analysis of data science work and workers as described within the data visualization, human computer interaction, and data science literature. From this analysis we synthesis a comprehensive model that describes data science work and breakdown to data scientists into nine distinct roles. We summarise and reflect on the role that visualization has throughout data science work and the varied needs of data scientists themselves for tooling support. Our findings are intended to arm visualization researchers with a more concrete framing of data science with the hope that it will help them surface innovative opportunities for impacting data science work.
LEGION: Visually compare modeling techniques for regression
Subhajit Das, Alex Endert
Abstract: People construct machine learning (ML) models for various use cases such as in healthcare, financial modeling, etc. In doing so, they aim to improve a models' performance by adopting various strategies, such as changing input data, tuning model hyperparameters, performing feature engineering etc. However, how would users know which of these model construction strategies to adopt for their problem? This paper aims to solve the problem of how to construct models and how to select a modeling strategy by allowing users to compare incoherencies between multiple regression models (constructed by two different modeling strategies) and then learn not only about the model but also about their data. We present LEGION, a visual analytic tool that helps users to compare and select regression models constructed either by tuning their hyperparameters or by feature engineering methods. We also present two use cases on real world datasets validating the utility and effectiveness of our tool.
VIMA: Modeling and visualization of high dimensional machine sensor data leveraging multiple sources of domain knowledge
Joscha Eirich, Dominik Jäckle, Tobias Schreck, Jakob Bonart, Oliver Posegga, Kai Fischbach
Abstract: The highly integrated design of the electrified power train creates new challenges in the holistic testing of high-quality standards. Test technicians face the challenge that tests for such new technologies are just about to be developed. Thus, they cannot rely on their gut feeling, but require automated support, which is not yet available. We present VIMA, a system that processes high dimensional machine-sensor data to support test technicians with their analyses of produced parts and to interactively create labels. We demonstrate the usefulness of VIMA in a qualitative user study with four test technicians. The results indicate that VIMA helps to identify abnormal parts, that were not detected by the established testing procedures. Additionally, we use the labels, generated interactively through VIMA, to deploy a model running on a test station in a real manufacturing environment; the model outperforms the current testing procedure in detecting increased backlashes of electrical engines.
Session 2: Keynote and Three Papers
Jessica Hullman
Jessica Hullman
Keynote: Why Interactive Analysis Needs Theories of Inference
Jessica Hullman, Northwestern University
Abstract: Data analysis is a decidedly human task. As Tukey and Wilk once wrote, “Nothing—not the careful logic of mathematics, not statistical models and theories, not the awesome arithmetic power of modern computers—nothing can substitute here for the flexibility of the informed human mind.” Research in supporting interactive and exploratory analysis has produced a number of sophisticated interfaces, many of which are optimized for easy pattern finding and data "exposure." However, visualization tools are often used by analysts and others to make inferences beyond the data, and as my own and others' research has shown, these inferences often deviate from the predictions of statistical inference. I'll describe how an absence of theories of inference that ground our understanding of how to design for interactive analysis may threaten the validity of conclusions people draw from visualizations, and describe what we've learned by using theories of statistical inference to better understand and design for intuitive visual analysis.

Bio: Jessica Hullman is an Associate Professor of Computer Science with a joint appointment in the Medill School of Journalism at Northwestern University. Her research looks at how to design, evaluate, coordinate, and think about representations of data for amplifying cognition and decision making. She co-directs the Midwest Uncertainty Collective, a lab devoted to better representations, evaluations, and theory around how to communicate uncertainty in data, with Matt Kay. Jessica is the recipient of a Microsoft Faculty Fellowship, NSF CAREER Award, and multiple best papers at top visualization and human-computer interaction conferences, among other awards.
dg2pix: Pixel-Based Visual Analysis of Dynamic Graphs
Eren Cakmak, Dr. Dominik Jäckle, Tobias Schreck, Daniel Keim
Abstract: Presenting long sequences of dynamic graphs remains challenging due to the underlying large-scale and high-dimensional data. We propose dg2pix, a novel pixel-based visualization technique, to visually explore temporal and structural properties in long sequences of large-scale graphs. The approach consists of three main steps: (1) the multiscale modeling of the temporal dimension; (2) unsupervised graph embeddings to learn low-dimensional representations of the dynamic graph data; and (3) an interactive pixel-based visualization to explore the evolving data at different temporal aggregation scales simultaneously. dg2pix provides a scalable overview of a dynamic graph, supports the exploration of long sequences of high-dimensional graph data, and enables the identification and comparison of similar temporal states. We show the applicability of the technique to synthetic and real-world datasets, demonstrating that temporal patterns in dynamic graphs can be easily identified and interpreted over time. Our dg2pix contributes a suitable intermediate representation between node-link diagrams at the high detail end, and matrix representations on the low detail end.
ContiMap: Continuous Heatmap for Large Time Series Data
Vung Pham, Ngan V. T. Nguyen, Tommy Dang
Abstract: Limited human cognitive load, limited computing resources, and finite display resolutions are the major obstacles for developing interactive visualization systems in large-scale data analysis. Recent technological innovation has significantly improved computing power, such as faster CPUs and GPUs, as well as display resources, including ultra-high-resolution displays and video walls. However, large and complex data is still ahead in the run as we are generating huge amounts of data daily. Our strategy to bridge these gaps is to present the right amount of information through the use of compelling graphics. This paper proposes an approximation algorithm and a web prototype for representing a high-level abstraction of time series based on heatmap designs. Our approach aims to handle a significant amount of time series data arising from various application domains, such as cybersecurity, sensor network, and gene expression analysis.
Visualizing and Analyzing Disputed Areas in Soccer
Jules Allegre, Romain Vuillemot
Abstract: Space ownership models assign 2D areas to individuals, based on their ability to reach locations according to their direction and speed. In this paper, we investigate the case where two or more individuals can reach a given location simultaneously. We refer to those locations as disputed areas, as there is tension and uncertainty on ownership, which is an important spatial analysis tool, e. g., in sports where players share a space with adversaries. We present the process to calculate those disputed areas from existing space ownership models, and introduce several visualizations and analysis of those areas using sport tracking data from Liverpool 2019’s goals. Those areas have been particularly insightful to understand assists, the ultimate pass that is critical for a team to score. We also report on feedback from experts both on the relevance of those areas as well as their visual design.