Skip to main content

SDS2020- Workshops

The following workshops will be held ONLINE on June 25, 2020.

  • 9am – 12:30pm:  An Experimental Exploratory Data Analysis for a Classification Task
  • 9am – 12:30pm: A Hybrid Edge-Cloud Platform for Self-Adaptive Machine Learning Based IoT Applications
  • 1:30pm – 5pm: Implementing Data Ethics in Business Processes
  • 1:30pm – 5pm: Machine Learning Push-Down to SAP HANA with Python
  • 1:30pm – 5pm: NLPeasy – Harnessing the Power of Unstructured Data

Separate registration for the workshops is required. The corresponding rooms will be announced closer to the conference. Workshops are limited in capacity and served on a first come basis.

An Experimental Exploratory Data Analysis for a Classification Task

Date: June 25, 2020
Time: 9am – 12:30pm

Workshop Overview

Explorative Data Analysis (EDA) is usually a process between data cleaning and data modeling with the goal to understand patterns, detect mistakes, check assumptions and check relationships between variables of a data set with the help of graphical charts and summary statistics.

The goal of this workshop is to extends the classical EDA journey, also provided in automated way by some tools.

The exploration approach is grouped by the classification variables: it goes on from univariate analysis to bivariate analysis, from some feature engineering techniques (one-hot encoding), used to facilitate the machine learning job, to statistics evaluations in order to select the best features able to explain the response variable.

During this process are handled some issues on explanatory variables, such as missing observations and outliers.

A deeply understanding of the data set is completed by an exploration, of several baseline models, splitted into two steps: without handling the imbalanced data set and handling the imbalanced data set.  

Will be used a data set coming from a classification task competition.

Target Audience (from a beginner level to intermediate level)
  • Data Scientists
  • Data Analysts
  • Statisticians
  • Academics and everyone interested in data science topic
Workshop Prerequisites
Workshop Lessons Learned 
  • Understand variables by visualization and statistical analysis
  • Grab some feature engineering and feature selection techniques
  • Select the best model exploring them and by features interpretability
  • Learn how to handle a imbalanced data set
Organizer
Claudio Giorgio Giancaterino

Actuary & Data Scientist: Aviva Italia Servizi Scarl

A Hybrid Edge-Cloud Platform for Self-Adaptive Machine Learning Based IoT Applications

Date: June 25, 2020
Time: 9am – 12:30pm

Workshop Overview

Cloud computing traditionally serves IoT applications by providing storage for generated data, and CPU power to produce value for their businesses. However, the growth of IoT is affecting the way traditional cloud architectures work. The increased amount of data to be transferred is creating bottlenecks while increasing the latency. Furthermore, sending such a big amount of data to a cloud environment in very short periods of time is inefficient, apart from cumbersome and expensive. This implies that much of this data must be aggregated at the “end points” where data is collected. And here is where Edge computing comes in. 

Edge Computing is not devised as a competitor to cloud; it is envisioned as the perfect ally for a broad spectrum of applications for which traditional Cloud Computing is not sufficient. Combining the edge approach with IoT sensors and Cloud would add flexibility and choices for users.

The workshop will be organized in two steps. The first step will answer these questions: 

  1. What problems does edge computing solve?
  2. How can we take advantage of the edge computing?
  3. How does edge computing and cloud computing work together?
  4. How to balance the load between the edge and the cloud ?
  5. What are the current technologies for designing hybrid platforms (edge, cloud) for secure IoT  applications ? These technologies will be illustrated by hands-on demonstrations.


The second step will present a generic open-source platform for intelligent IoT applications based on a shareable backbone infrastructure composed of three layers: IoT objects, edge devices and cloud infrastructure. Our framework: 

  • delivers machine learning models (MLM) learned in the cloud over data streams collected by the edge, from IoT devices.
  • supports lightweight learning algorithms that can execute on the edge and self-adapt without any synchronisation with the cloud


The platform delivers the following functionalities: 

  1. Coordinate application deployment from the cloud to the edge. The platform will target cloud,  edge and IoT devices. IoT applications can be deployed, configured, operated and maintained, using a shared infrastructure, where several applications can coexist.
  2. Continuously integrate, deploy and maintain MLMs on edge devices. Learning, which requires  considerably more resources, will take place in the cloud and the learned models will be deployed at the edge. This will considerably decrease the response time and the necessary bandwidth between the cloud and edge layer, since real-time data processing will take place close to the IoT devices. With this functionality, the edge triggers the learning process, although  it is performed in the cloud.
  3. Use lightweight, yet powerful, machine learning models that can be setup by, and deployed on, resource constrained devices that are typically used by edge devices. This alternative is possible for some IoT applications for which the learning process is light enough to run on the edge. With this functionality, the edge is more autonomous, its intelligence is “improved” locally.
Target Audience
  • PhD Students
  • Engineers
  • Scientists
  • Industrials and researchers interested in edge-cloud IoT  applications/platforms
  • Deployment and development of IoT applications

Workshop Prerequisites

Cloud basic knowledge

Workshop Lessons Learned 
  • Understand the added value of the edge compared to a centralized cloud based solution.
  • Browse the most common “technologies” used to deploy hybrid edge-cloud platforms.
  • Discover a “Swiss made” open-source technology used to deploy hybrid edge-cloud platforms.
Organizer
Nabil Abdennadher

Professor: University of Applied Sciences Western Switzerland

Marc-Elian Bégin

SixSq CEO, Co-Founder

Francisco Mendonca

HESGE

Implementing Data Ethics in Business Processes

Date: June 25, 2020
Time: 1:30pm – 5pm

 
Workshop Description

In 2019, the Swiss Alliance for Data-Intensive Services has launched a “Data Ethics Code”, whose updated version will be published early 2020. A follow-up project is currently underway that will create an implementation guide for the Data Ethics Code. The implementation guide will outline several possibilities on how data ethics can be integrated in business processes. The guide builds on experiences in other domains such as the healthcare system, where implementing ethics structures have a longer tradition compared to data ethics.

In the workshop, the beta-version of the implementation guide will be presented and discussed with company experts (e.g., senior management, organization developers, specialists in compliance and data ethics) whose task is to make data ethics operational in their business processes. It will help decision makers to understand the most important issues for a successful implementation of data ethics in a company.

Target Audience

Senior management, organization development experts, compliance experts, data ethics experts

Workshop Prerequisites

The workshop does not require specific prerequisites. The workshop is designed as a physical interaction experience and does not need any laptops. The main methodology is design thinking and focus group discussions.

Workshop Lessons Learned
  • Gaining insight into tools to implement data ethics in companies
Organizers
Markus Christen

Research Group Leader: Digital Society Initiative of the University of Zurich

PD Dr. sc. ETH, is research group leader in digital ethics and managing director of the Digital Society Initiative of the University of Zurich. christen@ethik.uzh.ch

Christoph Heitz

Professor: Zurich University of Applied Sciences

Prof. Dr., is co-chair of the Data Ethics Expert Group of the Swiss Alliance for Data-Intensive Services. He does research in the area of algorithmic fairness and data-based decision making in digital services.

Michele Loi

Researcher: Digital Society Initiative, University of Zurich

Dr. Phil, is researcher at the Digital Society Initiative, University of Zurich and co-chair of the Data Ethics work group of the Data + Service Alliance. He researches the ethics and political philosophy of data and algorithms.

Machine Learning Push-Down to SAP HANA with Python

Date: June 25, 2020
Time: 1:30pm – 5pm

Workshop Overview

Data Scientists that work in a business environment often require access to SAP data. Often this data is held in a high-performance in-memory appliance, which already contains Machine Learning algorithms. 

With this tutorial you will learn how to leverage that in-memory appliance to train Machine Learning (ML) models from your preferred Python environment. Trigger predictive algorithms in SAP HANA without having to extract the data. 

If you are already experienced with Machine Learning, you might be curious how to train ML models directly in SAP HANA from your preferred Python environment. That’s right, leverage the power of SAP HANA without leaving your existing Python framework! 

To get hands-on experience we will provide: 

  • Access to a SAP HANA system
  • A Python development environment (JupyterLab)
  • Installed libraries to your Python environment, which are needed to connect and push- down calculation and training logic to SAP HANA
  • A dataset to support the use case
  • A set of Jupyter Notebooks that have been prepared for you

The notebooks will implement a typical Machine Learning scenario in which a regression model is trained using the Predictive Algorithm Library (PAL).

Target Audience
  • Data Scientists
  • Python Users
  • SAP HANA Users
Workshop Prerequisites
  • Basic Python Know-How
  • Basic JupyerLabs Know-How
  • own Laptop with Internet Access
  • Google Chrome Browser
Workshop Lessons Learned 
  • You will learn how to get your hands-on data located in SAP HANA systems and other data stores
  • You will experience to run notebooks to trigger Machine Learning within SAP HANA
  • You will perform data exploration task on the source system without transferring data
  • You will learn how to use the developed model and bring it into an enterprise-ready environment
Co-Organizers
Andreas Forster

Machine Learning Expert: SAP (Schweiz) AG

Thomas Bitterle

Solution Advisor: SAP (Schweiz) AG

Michael Probst

Solution Advisor: SAP (Schweiz) AG

Harnessing the Power of Unstructured Data

Date: June 25, 2020
Time: 1:30pm – 5pm

Workshop Overview

Knowledge in most organisations is often only available as unstructured text in E-Mails, CRM entries, wiki articles etc. Harnessing this knowledge and making it available to the users that need it most is a challenging problem. Advances in machine learning and, in particular, NLP open new possibilities for doing intelligent and efficient knowledge management.

In this hands-on tutorial we will present how to use our open source software NLPeasy: quickly setup Pandas-based pipelines, enhanced with ML-Methods and pre-trained models (e.g. word embeddings, sentiment analysis). The results can then be saved in Elasticsearch, and Kibana dashboards can be automatically generated to explore the texts and results.

You will be lead from installation of the necessary tools, setting up a simple yet powerful NLP pipeline to ingesting texts into Elasticsearch. Then you will explore generated Kibana Dashboards for visualisation and adapt these.

We will also show how we used this approach successfully in different use cases.

Target Audience

Data Scientists looking for a toolkit to kick start NLP analyses

Workshop Prerequisites
Workshop Lessons Learned 
  • Learn about the fundamentals of NLP and how these are wrapped in NLPeasy
  • Discover how NLPeasy integrates with Elasticsearch and Kibana
  • Learn how quickly you can start your NLP analysis by setting up your first pipeline with NLPeasy
  • Learn how to leverage NLPeasy for exploratory analysis of textual data and deep dive
Co-Organizers
Philipp Thomann

Managing Consultant: D ONE

Jacqueline Stählin

Senior Consultant: D ONE