On the 25th of June 2020, the day prior to the SDS2020 conference, Andreas Forster, Thomas Bitterle and Michael Probst from SAP (Schweiz) AG organized a pre-conference workshop entitled “Machine Learning Push-Down to SAP HANA with Python”.
Data Scientists who work in business environments often require access to SAP data. But often this data is held in a high-performance, in-memory appliance, which already contains Machine Learning algorithms. This tutorial, suited for Data Scientists, Python users and SAP HANA users, showed the participants how to leverage in-memory appliance to train Machine Learning models from the – by the participants – preferred Python environment. It also showed how to trigger predictive algorithms in SAP HANA, without having to extract the data.
The participants learned how to get hands-on data located in SAP HANA systems and other stores. They also experienced running notebooks to trigger Machine Learning with SAP HANA. They performed data exploration tasks on the source system without transferring data. Finally, they learned how to use the developed model and to bring it into an enterprise-ready environment.
We want to say a huge “thank you” to all of our participants. We are glad that you all passed the workshop with excellence and hope you can leverage the new skills in the future.
In 2019, the Swiss Alliance for Data-Intensive Services launched the first version of its “Data Ethics Code”, whose update is currently in production and will be published in the fall of 2020. The Code will include an Implementation Guide that outlines several possibilities on how data ethics can be integrated in companies and business processes. Both the Codex recommendations and the Implementation Guide were topics of this workshop.
In total, seven persons representing a broad spectrum of institutions (Swiss Re, PostFinance, SMIs, public administrations and academia) participated in the workshop led by Christoph Heitz (ZHAW) as well as Markus Christen and Michele Loi (both from the University of Zurich). After a general introduction by Christoph and an in-depth presentations about the Codex by Michele and the Implementation Guide by Markus, the participants discussed in three small groups, representing different types of organizations (large and small companies, public institutions), which data ethics problems typically emerge in these contexts and which ethics structures would be adequate to resolve them.
The discussion revealed that large companies usually do have organizational measures in place to handle data ethics issues, but they may sometimes lack grounding in the day-to-day processes of the companies. A key challenge of smaller companies is to identify that ethical issues are part of their products and services. The difficulties to operationalized key values of the Codex such as transparency were discussed as well.
We thank the participants for an interesting workshop.
The 7th Swiss Conference on Data Science was held online on the 26th of June 2020, and on 25th June 2020 Claudio G. Giancaterino organized a pre-conference workshop about Exploratory Data Analysis topic from a differing point of view.
Usually the goal of Exploratory Data Analysis (EDA) is to understand patterns, detect mistakes, check assumptions and check relationships between variables of a data set with the help of graphical charts and summary statistics.
Instead, the goal of this workshop was to expand the classical EDA journey in a wider pipeline by an experimental approach that, step by step, with an iterative approach, tried to understand the impact of each action taken, into the behavior of models. The result was an Exploratory Data & Models Analysis.
The whole online workshop was conducted in a webinar format where the attendees (18) had the opportunity to interact with the speaker through a Q&A chat box asking questions during the presentation. The approach of the seminar was a hands-on workshop leaving attendees, at almost every step of the journey, the opportunity either to run Google Colaboratory notebooks with a sample of the data set and looking at the results, or the opportunity to challenge themselves with exercise notebooks filling in pieces of missing code.
Participants showed interest in the workshop, posting positive feedback at the end of the seminar, and during the webinar they asked questions about all arguments, with the minutes ticking by quickly.
Topics covered & discussed:
During the workshop a data set from a data science competition was used, and the goal of the classification task was to develop a model to predict whether or not a mortgage can be funded, based on certain factors in a customer’s application data.
The journey started with a quick look at the data set with the help of a visualization tool: AutoViz. Participants were thrilled with the tool used.
Then, the data set was divided into two paths: categorical variables with an encoding activity (transformation of each category string by a numerical representation) and numerical variables to look at the performance of several baseline models.
The Q&A chat box showed questions about issues linked to the use of one-hot encoding (it expands the features space).
At this point we handled missing values, replacing them with a data imputation strategy instead of removing interested rows. We did this because with dropping rows there is a risk of removing relevant features, this is why it is preferable to work with a complete data set. The participants agreed with this.
We then applied Exploratory Data Analysis to the data set, using bivariate analysis as feature selection for the relevant features. One of the questions was about the difference between PCA (it was born as dimensionality reduction but is often used to create new features) and the approach then followed (it was used to select the most predictive features on the target variable).
Before going to the last step, the handling of imbalanced classification, we managed outliers (extreme values that fall far away from the other observations). To it was applied logarithmic transformation to correct the skewness of some variables or discretization to mixture distributions and a new numerical feature was created. As explained, feature engineering con sometimes be frustrating because there are generated correlated features that need to be deleted in the preprocessing step, and business knowledge can play a significative role in the application of this methodology.
In the last step we discussed some strategies to face imbalanced classes in the classification task and applied some techniques.
· Oversampling: randomly sample (with replacement) the minority class to reach the same size of the majority class.
· Undersampling: randomly subset the majority class to reach the same size of the minority class.
· SMOTE (Synthetic Minority Over-sampling Technique): an over-sampling method that creates synthetic samples from the minority class instead of creating copies from it.
In all the steps, except the first one, we applied a modeling process to evaluate the impact of each action on the performance of the models, and the attendees were immediately interested in which models we used: Logistic Regression, AdaBoost, Gradient Boosting Machine, Bagging, Random Forest and Neural Network. For the Oversampling method, the best models were Gradient Boosting Machine and AdaBoost.
From Feature Importance Analysis, using permutation, the best feature able to explain the target values was Property Value for almost all models, instead of using Shap Values with Gradient Boosting Machine, the best feature was the Payment Frequency and specifically the Monthly Payment. The curiosity of candidates was focused on this mentioned feature because it was also the first ranked for its importance in the Logistic Regression and, moreover, the attention was focused in the feature created by the product between the interest rate and loan-to-value that showed importance.
At this year’s virtual pre-conference workshop, Philipp, Jacqueline, and Jürgen from D ONE got the chance to teach 20 participants about Natural Language Processing (NLP) with the python package NLPeasy. Philipp has written this python package to make it easier for data scientists to get started with NLP. It is a wrapper for other NLP packages such as VaderSentiment and SpaCy, and builds a bridge to Elasticsearch and Kibana. Elasticsearch is a document database, while Kibana is a dashboarding tool that can read data from Elasticsearch efficiently. The setup of Elasticsearch and Kibana is made simple by running them in Docker containers. Like this, the package can be used without long pre-installations and is a great starting point for any data exploration journey that works with text data.
In this pre-conference workshop, D ONE taught participants the main methods that are used in NLP. Because of the virtual setup, it was hard to involve participants in discussions as well. This is why they chose to deliver content in two modes. First, Philipp, who is trained in both Mathematics and Linguistics, explained the theoretical aspects in a classroom setting. Then, the participants split into smaller groups and entered breakout rooms. Jacqueline, Jürgen and Philipp helped the participants try out how NLP methods work in practise. For example, participants looked into how different words are represented by vectors. Also, they created and visualised syntax trees of different sentences.
Participants creating and visualising syntax trees of sentences during a breakout session.
D ONE did not want participants to get lost during the workshop, so they tried to limit technical problems as much as possible. The approach is highly recommendable: D ONE hosted VMs on a Binder Hub, so that participants did not have to follow lengthy installation protocols before the workshop. With one link, people could connect to the hub, where their own machine was instantiated. And off they went to start with NLP! For those interested, the NLPeasy tutorial can also be found on D ONE’s github account.
After discovering NLP basics, participants explored how NLPeasy can be set up, in a first tutorial . To this end, D ONE used a freely available, anonymised dataset of a dating website. It included text answers to profile sections of the app, such as “Describe yourself” and “What are you doing on a typical Friday night?”, as well as more structured information like city, languages, and age. The participants did some basic feature engineering, and then defined the pipeline steps they wanted to include, such as the text columns used for sentiment analysis. Then, they ran the enrichment on the dataset. Finally, they loaded the enriched dataset into Elasticsearch and a basic Kibana dashboard was created automatically, all using NLPeasy out-of-the-box functions.
An automatically generated Kibana-dashboard from the dating app dataset.
Overall the NLPeasy workshop was a great experience for both participants and hosts. With the workshop being held remotely, people from further away could participate, for example one participant joining from King Abdulaziz University in Saudi Arabia. It was fun to have such a diverse audience. Participants walked out of the workshop equipped with a new toolset not to be afraid of textual data analysis anymore, but instead to dive right into it. D ONE will host this workshop again in the future, may it be on site or remotely. For interested readers, D ONE is also happy to tailor an introductory NLP course to your company’s needs, so feel free to reach out any time!
Because of the exceptional situation, industry, academic and individual members of the Swiss Alliance for Data Intensive Services, joined the A.I. Use-Case Talk on the 17th of June 2020, in an online format for the very first time.
The Use-Case Talk Series allows participants to enjoy in-depth technical discussions and exchange information about interesting technical challenges amongst experts.
This time three industry experts and numerous participants took part in the Use-Case Talk to share stories and insights about frameworks, best practices and tools in data science.
The first Use-Case was presented by Christian Kindler, Full Stack Data Scientist at Valdon Mesh GmbH. Our participants learned how to Trade Options with A.I. Methods. Christian explained the Auto-Regressive Feed Forward Neural Network and demonstrated how A.I. can be used for trading.
Our second speaker, Achim Kohli, Co-Founder and CEO of legal-i, presented a live demo and participants gained insight into the company’s tool, which allows insurance lawyers to become 10x faster with the help of A.I.
Our last speaker of the night was Mark Schuster, Channel Manager at UiPath Switzerland GmbH. Mark explained how to apply A.I. to RPA workflows in minutes and provided interesting insights into Digital Claims and Voice Enabled Travel Systems.
Following the presentations, our speakers answered several questions by the participants. In the interesting Q&A session we exchanged ideas, challenges and information among the industry and academic experts.
This first online version of the Use-Case Talk Series was a success. However, we hope that we are able to host the Use-Case Talk Series in Technopark Zurich again in the near future so that participants have the opportunity to network and connect with other industry specialists at our sponsored Use-Case Talk Apéro.
The Use-Case Talks are part of a series that takes place three times a year. If you are interested in sharing your A.I. stories and discussing them with other industry members, you are warmly welcome to join us for our next Use-Case Talk taking place on the 21st of October 2020. If you are interested in presenting a Use-Case, please contact us by e-mail (firstname.lastname@example.org).
The Use-Case Talk Series is organized by Aspaara Algorithmic Solutions AG on behalf of Swiss Alliance for Data-Intensive Services.
On Thursday, 18th June 2020, we had our 9th meeting of the Expert Group “Blockchain Technology in Supply Chain Management”. The virtual meeting was dedicated to the concept and the application of ‘Self-Sovereign Identities’ in the digital area.
In the first part of the meeting, Martin Fabini, CTO from ti&m provided us with a short overview about the current concepts, developments and applied use cases in regard to SSIs. He explained how we have already moved from ‘centralized identities’ to ‘federated identities’ in our daily use of digital applications. However, he pointed out that we will need to develop a more user centric approach of a fully ‘self-sovereign identitiy’ (SSI) to get full control over our digital identities. The SSI building blocks – and their technological solutions – are currently a hot topic discussed in different consortia and foundations. A crucial part of all the SSI concept is a ‘decentralized identifier’ (DID) based on a decentralized trusted infrastructure provided by distributed ledger technologies (‘blockchain’).
In the second part of the meeting we discussed, in virtual breakout rooms, the possible business cases as well as the legal, technological, and business issues with SSI. In the discussion, we saw that there are many potential use cases and that some group members have already worked on some concrete pilots. However, the development of globally accepted standards is needed in order to overcome current legal and technological issues and to fully develop interoperable SSI concepts. Only a common SSI solution stack with compatible protocols will unlock the enormous business potential of SSIs.
Unfortunately, this time the meeting was concluded without our usual apéro due to obvious reasons. For our next meeting, planned in September, we hope that we can catch up again with this tradition and meet each other personally for an after-meeting networking apéro to exchange further ideas and contacts.
The 7th meeting of the spatial data analytics expert group was held online on June 4th 2020. The topic of the meeting was geovisualizations. This was the first meeting kindly hosted by a meeting chair. Simon Würsten from SBB had taken the lead in defining the schedule and in choosing the two talks.
Kevin Lang (SBB) presented the talk “Visualisation of accessibility at SBB”. It became clear that he had been confronted with some well known mapping problems and he presented nice solutions by his team. The talk generated questions and discussions.
Ralf Mauerhofer (Koboldgames) talked about “Murgame.ch – Natural hazard prevention by means of an online game” and gave some behind-the-scene-insights into the development of a game.
As usual, the participants had the possibility to interact and discuss. Part of the meeting was conducted in breakout-rooms for group discussions. After the meeting the participants gave arguments in favor for or against online meetings:
In theory it is possible to reach a larger audience in online meetings, because of the convenience of not having to spend any time traveling. In practice, however, the number of participants (19) was similar to previous in-person meetings. Sadly some participants could not connect due to internet security policies in their private network. A nice feature of the online format was the chat box where participants could ask questions even during the presentations. It was also used to share links and suggestions. After the meeting there was an informal discussion in a smaller group. We’re sure that the group would have been larger with the prospect of an Apero and a beer…
As a result we announce: The 8th meeting of the spatial data analytics expert group will be a physical meeting. The topic is “Applied sampling strategies” and Madlene Nussbaum will be our meeting chair. The meeting will take place on Thursday, August 27th 2020
Value creation through data is a very current topic of utmost importance. However, in practical applications it is often not sufficiently clear, or unknown, how value can be created for businesses and their customers. The CAS Data Product Design / Smart Service Engineering course offers practical solutions and concrete options for how companies easily can generate service value from data. In four modules, participants acquire methodological knowledge of data-driven value creation and value capturing (including data-driven business models) as well as questions of data ethics, data protection and data security.
In the course, new methods are taught and applied directly to a continuous case study in numerous iterations. The participants form small groups and choose a problem at the beginning of the course for which they develop a data-driven service during the course. During the two-day practical workshop at the Mobiliar Forum Thun in the beginning of June, we developed service ecosystems in intensive iterations to sharpen data-driven value propositions, including testing with potential users. This year’s special feature: the entire workshop took place online, which worked very well. Many thanks to Ina Goller for guiding us through the workshop and pushing our cases forward. We learned a lot. Many thanks also to Fabio Rovelli, managing director of the Mobiliar Forum Thun, for enabling this workshop.
The participants will publish their cases in short papers starting this year – summarized in an eBook. More information on this will be available over the summer, e.g., on the website of the Swiss Alliance for Data-Intensive Services.
legal-i is like a virtual lawyer, based on artificial intelligence (AI). It specializes in medical insurance cases (such as Invalid-, Accident-, Health-, BVG, Liability-Insurance etc.). In this field lawyers and insurance experts need to read through hundreds and thousands of pages in case-files. legal-i assists these experts to find the relevant data ten times faster. It also compares new cases with similar ones in the archives to predict their length, complexity and cost. Additionally, it can determine the internal expert with the most competence to work on the new case.
Symbiosis between Man and Machine
legal-i does not replace technical experts. Rather, it creates a symbiosis between man and machine. While the machine can search an enormous amount of data in a few seconds, the expert is essential in evaluating the results, putting them into social context and making argumentation and decisions based on them. As we can see, man and machine do not stand in competition with each other but work together, effectively complementing each other. This is often portrayed very differently in press articles and science fiction movies.
What is legal-i’s background story?
The founder of legal-i, Achim Kohli, is a trained lawyer with a three-year experience from a law firm. During his time at the firm, he became aware of the time-consuming and ineffective study of documents. He also found it frustrating that archived knowledge was not systematically reused. He discussed these issues with various persons from the insurance and legal industries and found that his problem was shared by them as well as by companies at large. As a consequence, Achim decided to leave his job in order to build a first prototype of legal-i, together with CTO Markus Baumgartner and AI specialist Prof. Erik Graf. This first prototype immediately met with very positive response from the industry.
Why is it important that legal-i exists?
Insurance experts and lawyers should spend their time with value added activities. They are too expensive and too well trained to “waste” their time with “boring” and “error-prone” work that a machine can do better. Today, there is still a lot to improve in this regard. A case in the medical insurance field often contains between 400 and 5,000 pages spanning over 200 separate documents. Insurance companies, lawyers, legal expense insurers, experts and judges, among others, need to look through all these documents – a time-consuming endeavor. The pages are read, either by scrolling (in a PDF-Reader) or in printed form. Then they are sorted in order to find the relevant data in the relevant documents. This way highly qualified medical insurance experts waste a lot of time doing bulk work, without guarantee to find all relevant information.
Who can profit from your services?
Especially Insurance companies, medical experts and law firms who work with Invalid-, Accident-, Health-, BVG and Liability-Insurance-Cases can profit from legal-i.
legal-i is all about empowering them in the effective use of their highly qualified expertise.
Can you give some more specific examples of legal-i’s tasks?
legal-i has several AI-models with the following functions:
It shows experts an overview of all the diagnoses in a case, regardless of how they are formulated. Then the expert can access a specific text passage with one click.
It compares diagnoses between different documents so that the user quickly can find out if, for example, a judge failed to consider a certain diagnosis that should have been taken into account.
It recognizes the document type: this way it is easier for a user to navigate a huge case.
It finds similar cases, medical expert opinions and judgments in the archives.
It disposes of an optimised key-word-search, also showing words with similar meaning.
What are your biggest challenges?
Our biggest challenges are to find the best IT-experts in our field. This is necessary as the extraction of relevant facts from unstructured medical and legal data is very complex and challenging.
We also want to be able to ever more deeply and thoroughly understand the need of each individual customer.
How do you see the future of legal-i and what is your long-term goal?
Our vision is the perfect symbiosis between man and machine in the domain of medical insurance. This is a symbiosis that empowers medical insurance experts to fully focus on added value to their activities while our AI assistant accomplishes the case-file study in a near perfect way.
Our long-term goal is to bring legal-i to the DACH-region and then to become the world’s leading insurtech start-up in the case-file study of medical insurance cases.
In these exceptional times of COVID-19, the Expert Groups of the Swiss Alliance continue to find ways to work! The Expert Group Leaders met online for the first time on the 28th of April.
18 Leaders of 10 Expert Groups took part in this first online exchange. All leaders presented their planned activities for 2020 and shared their experiences in getting the greatest added value for the members, discussing their challenges and lessons learned.
The “Smart Service” Expert Group shared their already acquired experience in holding meetings online. They had planned their quarterly expert group meeting in a lunch-talk format “Service Lunch” on March 18. Because of the restrictions, they made a change and held it online instead, which proved to be a success, generating even more participants than usual. The group decided to continue to host meetings in online formats also after the restrictions will be lifted and physical meetings possible again.
The Group Leaders shared some of the concrete plans the Expert Groups are working on.
The Expert Group “Data Ethics” is in the process of finalizing the Data Ethics Codex and are planning a Course in Data Ethics Training. The group is also looking forward to organizing another joint public event in collaboration with the UZH Digital Society Initiative, similar to the events “Algorithm Ethics” in 2018 and “Data Ethics” in 2019.
The Expert Group “Smart Services” shared their successful experiences in engaging with members and in setting up concrete and successful project proposals.
Members of the Expert Group “Blockchain in Supply Chain Management” are working on an article on a pre-study about blockchain-based track & trace.
The Leaders also discussed the value of continued exchange and collaborations between the groups, with some concrete collaborations planned in 2020.
Unfortunately, because of the online format, we were short of time for further discussion but will proceed with our activities in shaping innovation together with the members.