SDS2020- Workshops
The following workshops will be held ONLINE on June 25, 2020.
- 9am – 12:30pm: An Experimental Exploratory Data Analysis for a Classification Task
- 9am – 12:30pm: A Hybrid Edge-Cloud Platform for Self-Adaptive Machine Learning Based IoT Applications
- 1:30pm – 5pm: Implementing Data Ethics in Business Processes
- 1:30pm – 5pm: Machine Learning Push-Down to SAP HANA with Python
- 1:30pm – 5pm: NLPeasy – Harnessing the Power of Unstructured Data
Separate registration for the workshops is required. The corresponding rooms will be announced closer to the conference. Workshops are limited in capacity and served on a first come basis.
Date: June 25, 2020
Time: 9am – 12:30pm
Workshop Overview
Explorative Data Analysis (EDA) is usually a process between data cleaning and data modeling with the goal to understand patterns, detect mistakes, check assumptions and check relationships between variables of a data set with the help of graphical charts and summary statistics.
The goal of this workshop is to extends the classical EDA journey, also provided in automated way by some tools.
The exploration approach is grouped by the classification variables: it goes on from univariate analysis to bivariate analysis, from some feature engineering techniques (one-hot encoding), used to facilitate the machine learning job, to statistics evaluations in order to select the best features able to explain the response variable.
During this process are handled some issues on explanatory variables, such as missing observations and outliers.
A deeply understanding of the data set is completed by an exploration, of several baseline models, splitted into two steps: without handling the imbalanced data set and handling the imbalanced data set.
Will be used a data set coming from a classification task competition.
Target Audience (from a beginner level to intermediate level)
- Data Scientists
- Data Analysts
- Statisticians
- Academics and everyone interested in data science topic
Workshop Prerequisites
- Own laptop
- Basic/Intermediate Python knowledge
- Intermediate Statistics knowledge
- Google account to follow the lesson with Google Colaboratory (https://colab.research.google.com/notebooks/welcome.ipynb)
- Curiosity
Workshop Lessons Learned
- Understand variables by visualization and statistical analysis
- Grab some feature engineering and feature selection techniques
- Select the best model exploring them and by features interpretability
- Learn how to handle a imbalanced data set
Organizer

Claudio Giorgio Giancaterino
Actuary & Data Scientist: Aviva Italia Servizi Scarl
Date: June 25, 2020
Time: 9am – 12:30pm
Workshop Overview
Cloud computing traditionally serves IoT applications by providing storage for generated data, and CPU power to produce value for their businesses. However, the growth of IoT is affecting the way traditional cloud architectures work. The increased amount of data to be transferred is creating bottlenecks while increasing the latency. Furthermore, sending such a big amount of data to a cloud environment in very short periods of time is inefficient, apart from cumbersome and expensive. This implies that much of this data must be aggregated at the “end points” where data is collected. And here is where Edge computing comes in.
Edge Computing is not devised as a competitor to cloud; it is envisioned as the perfect ally for a broad spectrum of applications for which traditional Cloud Computing is not sufficient. Combining the edge approach with IoT sensors and Cloud would add flexibility and choices for users.
The workshop will be organized in two steps. The first step will answer these questions:
- What problems does edge computing solve?
- How can we take advantage of the edge computing?
- How does edge computing and cloud computing work together?
- How to balance the load between the edge and the cloud ?
- What are the current technologies for designing hybrid platforms (edge, cloud) for secure IoT applications ? These technologies will be illustrated by hands-on demonstrations.
The second step will present a generic open-source platform for intelligent IoT applications based on a shareable backbone infrastructure composed of three layers: IoT objects, edge devices and cloud infrastructure. Our framework:
- delivers machine learning models (MLM) learned in the cloud over data streams collected by the edge, from IoT devices.
- supports lightweight learning algorithms that can execute on the edge and self-adapt without any synchronisation with the cloud
The platform delivers the following functionalities:
- Coordinate application deployment from the cloud to the edge. The platform will target cloud, edge and IoT devices. IoT applications can be deployed, configured, operated and maintained, using a shared infrastructure, where several applications can coexist.
- Continuously integrate, deploy and maintain MLMs on edge devices. Learning, which requires considerably more resources, will take place in the cloud and the learned models will be deployed at the edge. This will considerably decrease the response time and the necessary bandwidth between the cloud and edge layer, since real-time data processing will take place close to the IoT devices. With this functionality, the edge triggers the learning process, although it is performed in the cloud.
- Use lightweight, yet powerful, machine learning models that can be setup by, and deployed on, resource constrained devices that are typically used by edge devices. This alternative is possible for some IoT applications for which the learning process is light enough to run on the edge. With this functionality, the edge is more autonomous, its intelligence is “improved” locally.
Target Audience
- PhD Students
- Engineers
- Scientists
- Industrials and researchers interested in edge-cloud IoT applications/platforms
- Deployment and development of IoT applications
Workshop Prerequisites
Cloud basic knowledge
Workshop Lessons Learned
- Understand the added value of the edge compared to a centralized cloud based solution.
- Browse the most common “technologies” used to deploy hybrid edge-cloud platforms.
- Discover a “Swiss made” open-source technology used to deploy hybrid edge-cloud platforms.
Organizer

Nabil Abdennadher
Professor: University of Applied Sciences Western Switzerland

Marc-Elian Bégin
SixSq CEO, Co-Founder
Francisco Mendonca
HESGE
Date: June 25, 2020
Time: 1:30pm – 5pm
Workshop Description
In 2019, the Swiss Alliance for Data-Intensive Services has launched a “Data Ethics Code”, whose updated version will be published early 2020. A follow-up project is currently underway that will create an implementation guide for the Data Ethics Code. The implementation guide will outline several possibilities on how data ethics can be integrated in business processes. The guide builds on experiences in other domains such as the healthcare system, where implementing ethics structures have a longer tradition compared to data ethics.
In the workshop, the beta-version of the implementation guide will be presented and discussed with company experts (e.g., senior management, organization developers, specialists in compliance and data ethics) whose task is to make data ethics operational in their business processes. It will help decision makers to understand the most important issues for a successful implementation of data ethics in a company.
Target Audience
Senior management, organization development experts, compliance experts, data ethics experts
Workshop Prerequisites
The workshop does not require specific prerequisites. The workshop is designed as a physical interaction experience and does not need any laptops. The main methodology is design thinking and focus group discussions.
Workshop Lessons Learned
- Gaining insight into tools to implement data ethics in companies
Organizers

Markus Christen
Research Group Leader: Digital Society Initiative of the University of Zurich
PD Dr. sc. ETH, is research group leader in digital ethics and managing director of the Digital Society Initiative of the University of Zurich. christen@ethik.uzh.ch

Christoph Heitz
Professor: Zurich University of Applied Sciences
Prof. Dr., is co-chair of the Data Ethics Expert Group of the Swiss Alliance for Data-Intensive Services. He does research in the area of algorithmic fairness and data-based decision making in digital services.

Michele Loi
Researcher: Digital Society Initiative, University of Zurich
Dr. Phil, is researcher at the Digital Society Initiative, University of Zurich and co-chair of the Data Ethics work group of the Data + Service Alliance. He researches the ethics and political philosophy of data and algorithms.
Date: June 25, 2020
Time: 1:30pm – 5pm
Workshop Overview
Data Scientists that work in a business environment often require access to SAP data. Often this data is held in a high-performance in-memory appliance, which already contains Machine Learning algorithms.
With this tutorial you will learn how to leverage that in-memory appliance to train Machine Learning (ML) models from your preferred Python environment. Trigger predictive algorithms in SAP HANA without having to extract the data.
If you are already experienced with Machine Learning, you might be curious how to train ML models directly in SAP HANA from your preferred Python environment. That’s right, leverage the power of SAP HANA without leaving your existing Python framework!
To get hands-on experience we will provide:
- Access to a SAP HANA system
- A Python development environment (JupyterLab)
- Installed libraries to your Python environment, which are needed to connect and push- down calculation and training logic to SAP HANA
- A dataset to support the use case
- A set of Jupyter Notebooks that have been prepared for you
The notebooks will implement a typical Machine Learning scenario in which a regression model is trained using the Predictive Algorithm Library (PAL).
Target Audience
- Data Scientists
- Python Users
- SAP HANA Users
Workshop Prerequisites
- Basic Python Know-How
- Basic JupyerLabs Know-How
- own Laptop with Internet Access
- Google Chrome Browser
Workshop Lessons Learned
- You will learn how to get your hands-on data located in SAP HANA systems and other data stores
- You will experience to run notebooks to trigger Machine Learning within SAP HANA
- You will perform data exploration task on the source system without transferring data
- You will learn how to use the developed model and bring it into an enterprise-ready environment
Co-Organizers

Andreas Forster
Machine Learning Expert: SAP (Schweiz) AG

Thomas Bitterle
Solution Advisor: SAP (Schweiz) AG

Michael Probst
Solution Advisor: SAP (Schweiz) AG
Date: June 25, 2020
Time: 1:30pm – 5pm
Workshop Overview
Knowledge in most organisations is often only available as unstructured text in E-Mails, CRM entries, wiki articles etc. Harnessing this knowledge and making it available to the users that need it most is a challenging problem. Advances in machine learning and, in particular, NLP open new possibilities for doing intelligent and efficient knowledge management.
In this hands-on tutorial we will present how to use our open source software NLPeasy: quickly setup Pandas-based pipelines, enhanced with ML-Methods and pre-trained models (e.g. word embeddings, sentiment analysis). The results can then be saved in Elasticsearch, and Kibana dashboards can be automatically generated to explore the texts and results.
You will be lead from installation of the necessary tools, setting up a simple yet powerful NLP pipeline to ingesting texts into Elasticsearch. Then you will explore generated Kibana Dashboards for visualisation and adapt these.
We will also show how we used this approach successfully in different use cases.
Target Audience
Data Scientists looking for a toolkit to kick start NLP analyses
Workshop Prerequisites
- Basic Pandas and Python
- Own laptop
- Curiosity for processing text data
- No previous NLP-Knowledge needed
- Completed pre-workshop installation requirements on https://github.com/d-one/NLPeasy-workshop
Workshop Lessons Learned
- Learn about the fundamentals of NLP and how these are wrapped in NLPeasy
- Discover how NLPeasy integrates with Elasticsearch and Kibana
- Learn how quickly you can start your NLP analysis by setting up your first pipeline with NLPeasy
- Learn how to leverage NLPeasy for exploratory analysis of textual data and deep dive
Co-Organizers

Philipp Thomann
Managing Consultant: D ONE

Jacqueline Stählin
Senior Consultant: D ONE