Introducing Soft Option-Critic for Blood Glucose Control in Type 1 Diabetes : Exploiting Abstraction of Actions for Automated Insulin Administration

Jenssen, Christian

Permanent lenke

https://hdl.handle.net/10037/19549

Åpne

thesis.pdf (1.678Mb)

(PDF)

Dato

2020-07-15

Type

Master thesis
Mastergradsoppgave

Forfatter

Jenssen, Christian

Sammendrag

Type 1 Diabetes (T1D) is an autoimmune disease where the insulin-producing cells are damaged and unable to produce sufficient amounts of insulin, causing an inability to regulate the body's blood sugar levels. Administrating insulin is necessary for blood glucose regulation, requiring diligent and continuous care from the patient to avoid critical health risks. The dynamics governing insulin-glucose are complex, where aspects such as diet, exercise and sleep have a substantial effect, making it a difficult burden for the patient. Reinforcement learning (RL) has been proposed as a solution for automated insulin administration, with the potential to learn personalized solutions for insulin control adapted to the patient. In this thesis policy-based RL-methods for T1D management are investigated and a new method is developed; Soft option-critic (SOC) is designed to better account for differing situations affecting the blood glucose, using temporally extended actions called options. Further extensions of the method are implemented, using key elements from deep Q-learning algorithms. The experiments are twofold; Several experiments are conducted to thoroughly assess the performance of SOC and its extensions on T1D in-silico patients: The first part of the experiments are done on the already solved environment lunar lander (LL) to analyze the merits of using options in the SOC-formulation. The second part consists of the diabetes experiments using a insulin-glucose simulator including scenarios with varying meals and bolus. The results show that SOC and its extension outperforms the benchmark algorithms on LL, learning options for improved sample-efficiency. On the diabetes experiments they performed comparable to the best benchmark model, beating the optimal baseline control method. The resulting policy was able to predict and account for meals, improving time-in-range (TIR) substantially.

Forlag

UiT Norges arktiske universitet
UiT The Arctic University of Norway

Metadata

Vis full innførsel

Samlinger

Mastergradsoppgaver i teknologi - anvendt fysikk [75]

Følgende lisensfil er knyttet til denne innførselen:

Original lisens

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)