Introducing Soft Option-Critic for Blood Glucose Control in Type 1 Diabetes : Exploiting Abstraction of Actions for Automated Insulin Administration
Permanent lenke
https://hdl.handle.net/10037/19549Dato
2020-07-15Type
Master thesisMastergradsoppgave
Forfatter
Jenssen, ChristianSammendrag
Type 1 Diabetes (T1D) is an autoimmune disease where the insulin-producing cells are damaged and unable to produce sufficient amounts of insulin, causing an inability to regulate the body's blood sugar levels.
Administrating insulin is necessary for blood glucose regulation, requiring diligent and continuous care from the patient to avoid critical health risks. The dynamics governing insulin-glucose are complex, where aspects such as diet, exercise and sleep have a substantial effect, making it a difficult burden for the patient.
Reinforcement learning (RL) has been proposed as a solution for automated insulin administration, with the potential to learn personalized solutions for insulin control adapted to the patient. In this thesis policy-based RL-methods for T1D management are investigated and a new method is developed; Soft option-critic (SOC) is designed to better account for differing situations affecting the blood glucose, using temporally extended actions called options. Further extensions of the method are implemented, using key elements from deep Q-learning algorithms.
The experiments are twofold;
Several experiments are conducted to thoroughly assess the performance of SOC and its extensions on T1D in-silico patients: The first part of the experiments are done on the already solved environment lunar lander (LL) to analyze the merits of using options in the SOC-formulation. The second part consists of the diabetes experiments using a insulin-glucose simulator including scenarios with varying meals and bolus. The results show that SOC and its extension outperforms the benchmark algorithms on LL, learning options for improved sample-efficiency. On the diabetes experiments they performed comparable to the best benchmark model, beating the optimal baseline control method. The resulting policy was able to predict and account for meals, improving time-in-range (TIR) substantially.
Forlag
UiT Norges arktiske universitetUiT The Arctic University of Norway
Metadata
Vis full innførselSamlinger
Copyright 2020 The Author(s)
Følgende lisensfil er knyttet til denne innførselen: