banner Image courtesy of Kirk Goldsberry


Registration at .

Sun. Oct 16, 2022, 9am - 12pm (CDT), 2022

Program Coming Soon ~

VDS @ KDD 2022 (Past)

Conference registration at

Sun. Aug 14, 2022, 1pm - 4:10pm (EDT)
1:00pm~1:10pm EDT
1:10pm~2:10pm EDT
Kalyan Veeramachaneni
Kalyan Veeramachaneni
1:10pm~2:10pm EDT
Keynote: Towards Usable Machine Learning
Kalyan Veeramachaneni, Massachusetts Institute of Technology
Abstract: TBD

Bio: Dr. Kalyan Veeramachaneni is a Principal Research Scientist at the Laboratory for Information and Decision System (LIDS) at MIT. He directs a research group called Data to AI in the new MIT Schwarzman College of Computing. His group focuses on building large-scale AI systems that work alongside humans, continuously learning from data, generating predictions and integrating those predictions into human decision-making. The group develops foundational algorithms, abstractions, and systems to enable these three tasks at scale. Algorithms, systems and open-source software developed by the group are deployed for applications in the financial, medical, and education sectors. Kalyan was the co-founder of PatternEx (acq. by Corelight), a cybersecurity company that adapts machine learning models based on real-time analyst feedback. He was also the co-founder of FeatureLabs (acq. by Alteryx), a data science automation company. He is currently a co-founder of DataCebo which focuses on improving data access and availability through synthetic data generation. Kalyan has published over 70 publications and his work on AI-driven solutions for data science and cybersecurity has been covered by major media outlets, including the Washington Post, CBS News, Wired, Forbes and Newsweek. He received his Masters in Computer Engineering and Ph.D in Electrical Engineering in 2009, both from Syracuse University. He joined MIT in 2009.
2:20pm~3:00pm EDT
Paper Session
2:20pm~2:40pm EDT
Yuncong Yu, Dylan Kruyff, Jiao Jiao, Tim Becker and Michael Behrisch
Abstract: We present PSEUDo, a visual pattern retrieval tool for multivariate time series. It aims to overcome the uneconomic (re-)training problem accompanying deep learning-based methods. Very high-dimensional time series emerge on an unprecedented scale due to increasing sensor usage and data storage. Visual pattern search is one of the most frequent tasks on time series. Automatic pattern retrieval methods often suffer from inefficient training data, a lack of ground truth labels, and a discrepancy between the similarity perceived by the algorithm and required by the user or the task. Our proposal is based on the query-aware locality-sensitive hashing technique to create a representation of multivariate time series windows. It features sub-linear training and inference time with respect to data dimensions. This performance gain allows an instantaneous relevance-feedback-driven adaption to converge to users’ similarity notion. We demonstrate PSEUDo’s performance in terms of accuracy, speed, steerability, and usability through quantitative benchmarks with representative time series retrieval methods and a case study. We find that PSEUDo detects patterns in high-dimensional time series efficiently, improves the result with relevance feedback through feature selection, and allows an understandable as well as user-friendly retrieval process.
2:40pm~3:00pm EDT
Wei Han, Yangqiming Wang, Christian Boehm and Junming Shao
Abstract: Although deep neural networks have shown well-performance in various tasks, the poor interpretability of the models is always criticized. In the paper, we propose a new interpretable neural network method, by embedding neurons into the semantic space to extract their intrinsic global semantics. In contrast to previous methods that probe latent knowledge inside the model, the proposed semantic vector externalizes the latent knowledge to static knowledge, which is easy to exploit. Specifically, we assume that neurons with similar activation are of similar semantic information. Afterwards, semantic vectors are optimized by continuously aligning activation similarity and semantic vector similarity during the training of the neural network. The visualization of semantic vectors allows for a qualitative explanation of the neural network. Moreover, we assess the static knowledge quantitatively by knowledge distillation tasks. Empirical experiments of visualization show that semantic vectors describe neuron activation semantics well. Without the sample-by-sample guidance from the teacher model, static knowledge distillation exhibit comparable or even superior performance with existing relation-based knowledge distillation methods.
3:00pm~3:10pm EDT
3:10pm~4:10pm EDT
Closing Keynote
Leo Zhicheng Liu
Leo Zhicheng Liu
3:10pm~4:10pm EDT
Keynote: Towards Scalable and Interpretable Visual Analytics
Leo Zhicheng Liu, University of Maryland College Park
Abstract: Knowledge discovery on large-scale complex data is challenging. Not only do we need to devise efficient methods to extract insights, we must also enable users to interpret, trust and incorporate their domain knowledge into the automated results. How do we combine data mining, machine learning, and interactive visualization to address this problem? In this talk, I will review related research projects in the context of exploring, summarizing, and modelling temporal event sequence data for various application domains. Through our investigation, we identify symbiotic relationships between automated algorithms and visualizations: data mining and machine learning techniques suggest salient patterns and predictions to visualize; visualizations, on the other hand, can support data analysis across multiple levels of granularity, uncover potential limitations in automated approaches, and inspire new algorithms and techniques. Reflecting upon past experiences, I will discuss challenges and opportunities in tightly coupling automated algorithms with interactive visual interfaces for effective knowledge discovery.

Bio: Dr. Zhicheng Liu is an assistant professor in the department of computer science at University of Maryland. His research focuses on scalable methods to represent and interact with complex data, as well as techniques and systems to support the design and authoring of expressive data visualizations. Before joining UMD, he worked at Adobe Research as a research scientist and Stanford University as a postdoc fellow. He obtained his PhD at Georgia Tech. His work has been recognized with a Test-of-Time award at IEEE VIS, and multiple Best Paper Awards and Honorable Mentions at ACM CHI and IEEE VIS.