NLPeasy – Harnessing the Power of Unstructured Data, Pre-conference Workshop 25.06.2020
At this year’s virtual pre-conference workshop, Philipp, Jacqueline, and Jürgen from D ONE got the chance to teach 20 participants about Natural Language Processing (NLP) with the python package NLPeasy. Philipp has written this python package to make it easier for data scientists to get started with NLP. It is a wrapper for other NLP packages such as VaderSentiment and SpaCy, and builds a bridge to Elasticsearch and Kibana. Elasticsearch is a document database, while Kibana is a dashboarding tool that can read data from Elasticsearch efficiently. The setup of Elasticsearch and Kibana is made simple by running them in Docker containers. Like this, the package can be used without long pre-installations and is a great starting point for any data exploration journey that works with text data.
In this pre-conference workshop, D ONE taught participants the main methods that are used in NLP. Because of the virtual setup, it was hard to involve participants in discussions as well. This is why they chose to deliver content in two modes. First, Philipp, who is trained in both Mathematics and Linguistics, explained the theoretical aspects in a classroom setting. Then, the participants split into smaller groups and entered breakout rooms. Jacqueline, Jürgen and Philipp helped the participants try out how NLP methods work in practise. For example, participants looked into how different words are represented by vectors. Also, they created and visualised syntax trees of different sentences.
Participants creating and visualising syntax trees of sentences during a breakout session.
D ONE did not want participants to get lost during the workshop, so they tried to limit technical problems as much as possible. The approach is highly recommendable: D ONE hosted VMs on a Binder Hub, so that participants did not have to follow lengthy installation protocols before the workshop. With one link, people could connect to the hub, where their own machine was instantiated. And off they went to start with NLP! For those interested, the NLPeasy tutorial can also be found on D ONE’s github account.
After discovering NLP basics, participants explored how NLPeasy can be set up, in a first tutorial . To this end, D ONE used a freely available, anonymised dataset of a dating website. It included text answers to profile sections of the app, such as “Describe yourself” and “What are you doing on a typical Friday night?”, as well as more structured information like city, languages, and age. The participants did some basic feature engineering, and then defined the pipeline steps they wanted to include, such as the text columns used for sentiment analysis. Then, they ran the enrichment on the dataset. Finally, they loaded the enriched dataset into Elasticsearch and a basic Kibana dashboard was created automatically, all using NLPeasy out-of-the-box functions.
An automatically generated Kibana-dashboard from the dating app dataset.
Overall the NLPeasy workshop was a great experience for both participants and hosts. With the workshop being held remotely, people from further away could participate, for example one participant joining from King Abdulaziz University in Saudi Arabia. It was fun to have such a diverse audience. Participants walked out of the workshop equipped with a new toolset not to be afraid of textual data analysis anymore, but instead to dive right into it. D ONE will host this workshop again in the future, may it be on site or remotely. For interested readers, D ONE is also happy to tailor an introductory NLP course to your company’s needs, so feel free to reach out any time!