Imputation and classification of time series with missing data using machine learning
Permanent lenke
https://hdl.handle.net/10037/21916Dato
2021-06-21Type
MastergradsoppgaveMaster thesis
Forfatter
Dretvik, Vilde FonnSammendrag
This work is about classifying time series with missing data with the help of imputation and selected machine learning algorithms and methods. The author has used imputation to replace missing values in two data sets, one containing surgical site infection (SSI) data of 11 types of blood samples of patients over 20 days, and another data set called uwave which contain 3D accelerometer data of several patterns made by a subset of people, where two patterns were selected. The SSI data set is known to possess informative missingness. For the uwave data, missing data was simulated by removing data points in an informative (not random) way to simulate missing data. The DTW and Euclidean distances were computed for each imputed data set to make distance grid matrices, and used to performed classification on the data using the K Nearest Neighbour (KNN) classifier and the Support Vector Machine (SVM) classifier. Furthermore the data set features were augmented by adding masks that indicate the presence of missing data and counters of consecutive spells of missing data to help exploit informative missingness. The augmented dataset was used to classify the data using the same classifiers and distance methods mentioned earlier, in addition to a newer classifier called the Temporal Convolution Network (TCN), which used the augmented data in combination with imputation of the original data. It was found that applying Dynamic Time Warping (DTW) was unnecessary for the KNN classifier, and that Euclidean distance was sufficient. Augmenting the data was found to improve the overall results for the SVM and KNN classifier. The TCN was found to need more work due to giving unstable test results with much lower values than the validation would imply.
Forlag
UiT Norges arktiske universitetUiT The Arctic University of Norway
Metadata
Vis full innførselSamlinger
Copyright 2021 The Author(s)
Følgende lisensfil er knyttet til denne innførselen:
Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Relaterte innførsler
Viser innførsler relatert til tittel, forfatter og emneord.
-
Geometric Modeling- and Sensor Technology Applications for Engineering Problems
Pedersen, Aleksander (Doctoral thesis; Doktorgradsavhandling, 2020-10-20)In applications for technical problems, Geometric modeling and sensor technology are key in both scientific and industrial development. Simulations and visualization techniques are the next step after defining geometry models and data types. This thesis attempts to combine different aspects of geometric modeling and sensor technology as well as to facilitate simulation and visualization. It includes ... -
Engineering methods for enhancing railway geometry and winter road assessment: A safety and maintenance perspective
Brustad, Tanita Fossli (Doctoral thesis; Doktorgradsavhandling, 2020-06-22)In many areas around the world there are limited transportation possibilities when travelling between key cities. If these areas also experience demanding weather conditions or geography, getting from A to B, during difficult conditions, is usually not optimal in regards to accessibility, safety, and comfort. Under challenging conditions, two essential elements in strengthening accessibility, safety, ... -
Iceberg Drift-Trajectory Modelling and Probability Distributions of the Predictions
Baadshaug, Ole (Master thesis; Mastergradsoppgave, 2018-06-29)Moving icebergs represent a major problem for shipping, as well as for oil and gas installations in ice infested waters. To be able to take actions against hazardous icebergs, it is necessary to develop models for prediction of iceberg drift trajectories. Many models have been developed in order to do so, using different approaches. These approaches can be divided into two main categories, dynamic ...