SDS2019 – Poster Presentation
Reception – A Deep Learning Based Hybrid Residual Networks
Deep neural networks can be difficult to train and require extensive fine-tuning for hyper-parameter optimization. In this paper, a generalized deep hybrid convolutional neural network model is proposed, named Reception that not only solves the problem of finding the optimal kernel size but also has the features of both ResNet and Inception. The proposed Reception module, compliments the learning of filters having small and large receptive fields. This allows the network to extract the tiniest of details as well as the broadest of shapes. Although this strategy increases the width of the network and the number of parameters, the depth requirement of the network reduces significantly. Moreover, the number of parameters are kept in line using a carefully crafted design. The model, when used for classifying ships in satellite images, achieves a mean test accuracy of 98.56% with a standard deviation of 0.14 in 5-fold cross-validation and an F1-score of 0.99.
Predictive Modeling for Optimization of Field Operations in Bike-Sharing Systems
This article presents a framework to facilitate and optimize the management of field operations for bike-sharing companies. The study focuses on two modules based on artificial intelligence: the prediction module forecasts bikes availability at station-level using machine-learning and the rebalancing module provides optimal rebalancing operations and routes using constraint programming. The evaluation on 9 months of data collected from a real bike-sharing network notably highlighted the superior forecasting accuracy of the Multilayer Perceptron algorithm.
High Precision Agriculture: An Application Of Improved Machine-Learning Algorithms
This paper presents the performances of machine learning algorithms on aerial images object detection for high precision agriculture. The dataset used focuses on geotagged pictures of vineyards. We demonstrate that advanced machine
learning methodologies like Decision Tree Ensemble, outperform state-of-the-art image recognition algorithms generally used within the agriculture field. The innovative approach described here improve object detection and obtain an accuracy of 94.27% which is an increase of more than 4% compared to the state-of-the-art. Finally, methodology and possible developments for high precision agriculture are discussed in this study.
A reliable approach for pixel-level classification of land usage from spatio-temporal images
The ongoing advancements in deep learning, and exemplary results obtained for different problems using spatio-temporal satellite images, have made deep neural networks quite popular for analysing Earth Observation data. The deep learning models have the capability to learn complex features from the dataset available, and specific to the problem at hand. In this research, the aim is to classify field parcels in images from the Sentinel-2A satellite and identify the corresponding crops using a Recurrent Neural Network (RNN). To obtain a good classification network, the mandatory requirement is, clean and reliably labelled data, which is a challenging task in real world applications. What if the labels are not reliable due to manual errors or due to the complexity of annotating data?
Predicting Housing Market Trends Using Twitter Data
The main goal of this study is to predict the short term upward or downward trend of the average house price in the Dutch market by using text data collected from Twitter. Here, tweets including predefined search words are collected relying on domain knowledge, and the corresponding text is grouped by month as documents. Then words and word sequences are transformed into numerical values, which served as attributes to predict whether the housing market trends, i.e. we approached this as a binomial classification problem relating text data of a month with (up or down) trends for the following month. Our results reveal that there is a correlation between the (weighted) frequency of words and short-term housing trends.
Embedded Deep Learning for Sleep Staging
Engin Türetken, Ricard Delgado-Gonzalo
The rapidly-advancing technology of deep learning (DL) into the world of the Internet of Things (IoT) has not fully entered in the fields of m-Health yet. Among the main reasons are the high computational demands of DL algorithms and the inherent resource-limitation of wearable devices. In this paper, we present initial results for two deep learning architectures used to diagnose and analyze sleep patterns, and we compare them with a previously presented hand-crafted algorithm. The algorithms are designed to be reliable for consumer healthcare applications and to be integrated into low-power wearables with limited computational resources.
TCMD: A Two-Tier Classification Model for Anomaly-based Detection in IoT
The Internet of Things (IoT) is a new technology paradigm that refers to distributed physical devices that are connected to the Internet. The large amount of data generated by these devices is considered to be a challenging issue. This data suffers from anomalies or abnormal behaviour for a number of reasons, such as sensor faults or attack issues. However, the data collected from IoT devices is usually unlabelled, which means that the normal or anomaly classes are unknown. This study proposes TCMD, a two-tier classification model for anomaly detection in IoT. In addition, it describes the validation methods used for the model to evaluate the quality of the clustering and the performance of the classification. TCMD firstly employs hierarchical affinity propagation (HAP) clustering to group the data into normal and anomaly clusters. Secondly, the labelled data obtained from the clustering is used to train decision trees (DTs). The results show that the TCMD is able to label the data which can be helpful to reduce human intervention. In addition, in terms of false positive rate (FPR), TCMD performs well compared with the DTs on the original dataset and outperforms the state-of-the-art model.
Machine Learning for Position Detection in Football
Martin Frey, Marcel Dettling
Newly developed, wearable tracking devices allow football players to log position and motion data during games. These data can be exploited for enhancing the player’s performance. Our goal is to predict spatial player positions, which serves for advanced tactical analyses. Therefore, we compare three different machine-learning approaches that include Random Forest, Gradient Boosting (xgboost) and a Convolutional Neural Network (CNN) with locally connected layers. These are based on the absolute position in x- and y-direction, as well as x- and y-position relative to the center of the team. From these we generated heatmaps as inputs for the CNN, while the median and MAD were calculated for xgboost and the Random Forest. All three approaches yielded similar accuracy just above 80%. A big accuracy boost was achieved by using relative positions.
Unsupervised Anomaly Detection for Seasonal Time Series
Leandro von Werra, Lewis Tunstall
We extend eBay’s Atlas algorithm to automatically detect anomalies in unlabeled, seasonal time series data. Named MULDER, the algorithm involves deriving a “surprise” metric from the time series, which is then analysed statistically for anomalies. We evaluate the efficacy of MULDER via the Numenta Anomaly Benchmark, and calibrate it for deployment with injected anomalies on production data. We find that MULDER can be used to create alerts with a low false positive rate, and outperforms several popular open source implementations.
Entity Matching on Unstructured Data: An Active Learning Approach
With the growing number of data sources in enterprises, entity matching becomes a crucial part of every data integration project. In order to reduce the human effort involved in identifying matching entities between different database tables, typically machine learning algorithms are applied. Moreover, active learning is often combined with supervised machine learning methods to further reduce the effort of labeling entities as true or false matches. However, while state-of-the-art active learning algorithms have proven to work well on structured data sets, unstructured data still poses a challenge in entity matching. This paper proposes an end-to-end entity matching pipeline to minimize the human labeling effort for entity matching on unstructured data sets. We use several natural language processing techniques such as soft tf-idf to pre-process the record pairs before we classify them using a novel Active Learning with Uncertainty Sampling (ALWUS) algorithm. We designed our algorithm as a plugin system to work with any state-of-the-art classifier such as support vector machines, random forests or deep neural networks. Detailed experimental results demonstrate that our end to-end entity matching pipeline clearly outperforms comparable entity matching approaches on an unstructured realword data set. Our approach achieves significantly better scores (F1-score) while using 1 to 2 orders of magnitude fewer human labeling efforts than existing state-of-the-art algorithms.
Leak detection using Random Forest and pressure simulation
Lucien Aymon, Francesco Carrino
The purpose of this project is to monitor leakage and consumption in a non-pressurized agricultural irrigation system using only inexpensive and easily installed pressure sensors. We modeled the water network to automatically simulate a leak randomly through the network. These simulated pressures serve as a dataset to train, test and validate a Random Forest algorithm that detects the leaks.