User Satisfaction in Augmented Reality-based Training using Microsoft HoloLens

With the recent developments in augmented reality (AR) technologies comes an increased interest in the use of smart glasses for hands-on training. Whether this interest is turned into market success or not depends at the least on whether the interaction with smart AR glasses satisfies users, an aspect of AR use that so far has received little attention. With this contribution, we seek to change this. The objective of the article, therefore, is to investigate user satisfaction in AR applied to three cases of practical use. User satisfaction of AR can be broken down into satisfaction with the interaction and satisfaction with the delivery device. A total of 142 participants from three different industrial sectors contributed to this study, namely, aeronautics, medicine, and astronautics. In our analysis, we investigated the influence of different factors, such as age, gender, level of education, level of Internet knowledge, and the roles of the participants in the different sectors. Even though users were not familiar with the smart glasses, results show that general computer knowledge has a positive effect on user satisfaction. Further analysis using two-factor interactions shows that there is no significant interaction between the different factors and user satisfaction. The results of the study affirm that the questionnaires developed for user satisfaction of smart glasses and the AR application performed well, but leave room for improvement.


Introduction
Augmented Reality (AR) means enhancing the user's perception "with additional, artificially generated sensory input to create a new experience including, but not restricted to, enhancing human vision by combining natural with digital offers" (Wild et al., 2018).Augmented Reality typically has three characteristics [1]: first, AR combines the virtual with the real world; second, objects are registered from both the real and virtual world in one coordinate system; third, the interaction between the objects of both worlds is possible in real time.
Hands-on training is important for many disciplines and professions, such as medical workers, mechanics, technicians, electricians, engineers, sailors, pilots, and firefighters.In the past decade, AR has been increasingly employed for a number of training applications, such as medical education [2], rehabilitation engineering [3], automotive safety [4], task assistance [5], and manufacturing [6].
For the successful adoption of AR-based training across different domains, one of the key factors is user satisfaction.User satisfaction is defined as a combination of different factors associated with the usage of the AR application and the associated delivery device [7].These factors include: a feeling of powerfulness and achievement; an efficient use of time, effort, and other resources; meaningful content; a better insight to the training environment; a natural interaction; a feeling of amazement; performance that exceeds expectations; playfulness; the invoking of positive feelings and pleasing memories; immersion and engagement; a transparent interaction; the feeling of participation in a community; a sense of privacy of the user's content; inspiration, encouragement, and motivation; and, finally, artistic creativity [7].
The rest of this paper is organized as follows.First, we turn to the state of the art, summarizing what the research has found so far with respect to AR user interaction, AR user satisfaction, and questionnaires used for evaluating user satisfaction.Next, the AR app used in the trials is described.
Subsequently presented are the research methodology and a summary of the information of the participants, devices, design of trial tasks, and evaluation methods.Finally, findings and results are illustrated, and the discussion and conclusion are given at the end.
The main objective of this study is to test and observe user satisfaction in using AR applications and using AR glasses.The method for evaluating includes questionnaires and interviews.The AR app used in this evaluation, therefore, has two parts: one is the expert recording the experience in the workplace, and the other part is the novices training on work-related procedures using said recordings.
In this study, we evaluated the following research hypotheses: to find if experts and students are satisfied with the prototype application, to see if the application can increase interest in learning new skills, and to evaluate if the users find the application easy to use.

AR user interaction
AR technologies provide a different user experience than that of, for example, mobile phone apps.The user interacts with the surrounding real world, combining inputs from the environment with digital augmentations.Popular examples include PokemonGO and SnapChat.These type of apps certainly brought the term "augmented reality" into the spotlight [8].With the advent of consumer-grade AR glasses, different types of AR user interactions are becoming necessary.For example, a user who is wearing Microsoft's HoloLens can communicate diagrams and other types of graphics directly embedded into the environment to a different, remote user (see Figure 1).

AR user satisfaction and questionnaires for evaluating user satisfaction
AR Technology has evolved from offline to online, from static devices to mobile devices, and from desktop and mobile to wearable devices [10].Consequently, with AR development over the past decade or so, special attention has been drawn to the maximization of AR user satisfaction.AR user satisfaction is dependent on both the design of the user interface (UI) and the choice of the AR hardware.Personalization of AR glasses can lead to greater AR user satisfaction [11].AR apps designed for a good user experience result in a more overall satisfied AR user.This applies to AR navigation apps, AR health apps, AR education apps, and certain AR smart glasses games [12].
There are several concepts and subjective measures for evaluating the user experience of AR designed to assess users' subjective satisfaction with specific aspects of the human-computer interface [13].The results of QUIS facilitate new developments by addressing reliability and validity problems found using its satisfaction measurements.Therefore, the measure is highly reliable across many types of interfaces.
QUIS consists of a demographic questionnaire, a six-scale measure of overall system satisfaction, and hierarchically organized measures.The measures include the following specific interface factors [13]: screen factors, terminology and system feedback, learning factors, system capabilities, technical manuals, online tutorials, multimedia, teleconferencing, and software installation.Each area is measured by a 7-point scale according to the user's overall satisfaction with the interface and the above factors [13].

The AR application
The trials of the project investigated how satisfied users are with the novel method and the AR glasses.The app is designed for HoloLens with two modes.One is called the WEKIT (Wearable Experience for Knowledge Intensive Training) recorder and another one is called the WEKIT player.
The recorder mode tracks and records the performance of the experts.To create the required instruction for a procedure, experts can create annotations for each action step of the procedure.These annotations can then be played back to the trainees.There are several types of annotations that can be added to this app.Figure 2 is the user interface (UI) of the recording application.Each icon represents a different type of annotation.Figure 3 shows a ghost track recording and replay.The WEKIT player is the mode designed for trainees to learn the operation that was recorded before.Each scenario has a different recording and tasks.Therefore, the app starts by recognizing an Augmented Reality marker, and then it decides which scenario is going to be used in the subsequent tasks.Markers are always detected by the front camera on HoloLens.The WEKIT player starting screen is shown in Figure 4. Once the task starts, the annotation is shown in HoloLens.From the perspective of the users, the annotations overlap with the facilities.They guide the user to do the task step by step.Gesture command, voice command, and the Physical HoloLens click button are all available when using the app. Figure 5 shows an example of using the WEKIT player to do the task.their respective fields were recruited.A total of 95 learners (23 females; 72 males) from different fields, including medicine, engineering, and aerospace, voluntarily participated in the trials.A majority of the participants (68) were in the 18-24 age group, followed by 48 of the participants in the range between 25 and 34.Most of the participants had moderate or better computer knowledge and internet knowledge.
Here, we defined computer knowledge and internet knowledge as very poor, poor, moderate, good, and very good for five different levels.All participants gave written consent for the trials.

Material and Apparatus
The trial used Microsoft HoloLens as wearable AR glasses for assessing the user's satisfaction with AR training.There are two parts in the WEKIT technology platform [16] deployed on HoloLens.
One is a recorder for capturing expert experience and the other one is a player for presenting the expert's experience to the trainees.During the trial, all interactions and manipulations were done by using gesture and voice command only.

Trial design/task
The trial tasks were separated into three different areas, as mentioned in section 4.1.Tasks in the Aeronautics use case were performed at Lufttransport, Norway.A pre-flight inspection task was performed with the air ambulance plane Beechcraft B200 [17].The experts comprised maintenance apprentices, skilled workers (mechanics), and technicians working on base maintenance at Lufttransport.As an example, an expert wearing HoloLens and using the application to record their voice, image, and movement is shown in Figure 6.The novice group comprised student volunteers from UiT The Arctic University of Norway [17].They followed the instructions from the application on HoloLens for completing the task.The medical task involved imaging and diagnostic workers and was conducted at EBIT (Esaote's Healthcare IT Company) in Genoa, Italy [18].This task was for training medical students and radiologist apprentices on using MyLab8, an ultrasound machine produced by ESAOTE [19].Similar to the trial at Lufttransport, the experts added audio recordings, pictures, annotations, and 3D models by using the recorder application.The novices performed the task based on the recorded expert's experience.In Figure 7, we can see a novice performing a task by positioning the probe in the target direction and taking measurements using the player application.The space task that was conducted at the ALTEC facility in Turin involved training astronauts to install the Temporary Stowage Rack (TSR).TSR installation is a procedure that the astronauts have to perform on the International Space Station (ISS) [20].Similar to the trials at the other two organizations, the experts designed the training scenario and added annotations by using the recorder application.
The novices performed the task based on the recorded content.In Figure 8, we can see an astronaut trainer performing a task in a replica-training module of the international space station.

Smart Glasses User Satisfaction (SGUS)
The Smart Glasses User Satisfaction (SGUS) questionnaire was created for the WEKIT trials.It is a tool designed to assess users' subjective satisfaction with smart glasses.SGUS is a method and measure to scrutinize aspects, such as an enhanced perception of the environment, interaction with the augmented environment, implications of location and object awareness, the user-created AR content, and the new AR features that users typically use [7].The general objective of the questionnaire is to understand the potential end users' central expectations of AR services with smart glasses, especially from an experiential point of view [7].In this study, the smart glasses used for the different use cases on evaluation criteria for web-based learning [14] and statements evaluating the user experience of mobile augmented reality services [7].SGUS consists of 11 items (statements) on a 7-point Likert scale (1-7) [17].The 11 statements include three categories of evaluation criteria, which are general interface usability criteria, AR interaction-specific criteria for an educational AR app, and learner-centered effective learning [14].
4.5.Questionnaire for User Interface Satisfaction (QUIS) The Questionnaire for User Interaction Satisfaction (QUIS) measures subjective satisfaction with specific aspects of the interface and interaction between the user and the AR application [21].In this study, QUIS was modified for AR glasses, i.e., HoloLens.Hence, a questionnaire with 15 items was used.In order to maintain consistency with the survey in other sections, each item was mapped to a numeric value of 1-7 instead of the 9-point scale [21].

Procedure
As most participants had no experience with AR glasses, at the beginning of the trial, they were asked to familiarize themselves with the AR glasses, i.e., HoloLens.In order to do this, gesture training with HoloLens was done before they started using the application.The application comprised a scenario that the participants had to complete in a particular use case setting.The content of the application was generated by experts in that specific use.After the participants completed all the tasks, they were provided with the QUIS and SGUS questionnaires to complete.

Descriptive statistics
In this section, descriptive statistics for SGUS and QUIS are described.

SGUS
As mentioned before, SGUS has 11 items.The summation of the score for the 11 items is the SGUS score.As shown in Table 1, we provide data such as: n (number of participants), mean, standard deviation, minimum value, Q1 (the first quartile: "middle" value in the first half of the rank-ordered data set), median, Q3 (the third quartile: "middle" value in the second half of the rank-ordered data set), and maximum value for the following variables: gender, education level, roles, and organizations.
Based on these results, it is clear that the mean scores are similar across the different levels associated with the variables.

Correlation
In this section, we discuss correlation for SGUS and correlation for QUIS.In the study of SGUS, each of the items investigates a different aspect of the user experience.For the analysis, the overall averages for all items were calculated.Figure 9 shows the plot of the average score from individual items.The box in the plot depicts the answer of 50% of the participants, with the line in the middle indicating the median.The dotted lines span the 95% confidence interval.Outliers are depicted with black dots.The connected red dots indicate the medians.The results imply that most of the participants had a good conception of what is real and what is augmented when using AR-glasses (GL5).The participants indicated that the system and content helped them to accomplish the task quite well (GL7) and their attention was captivated in a positive way (GL6).The provided content was also seen as contextually meaningful (GL2).However, performing the task with AR glasses was experienced as less natural (GL9, GL4), and following and understanding the task phases (GL8, GL10-11) was not very easy.The results were very much in line across the three.Signif.codes: *p < 0.05 ; **p < 0.01; ***p < 0.001 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Smart Glasses User Satisfaction (All)

Preprints
Strongly Disagree

Disagree Somewhat Disagree
Neither agree or disagree

Agree
Strongly Agree q q q q q q q q q q q Figure 9. Plot of SGUS score for each item.

Correlation of QUIS
The correlation for QUIS is based on 15 items.The results of Spearman's rank correlation are shown in Table A1 (see Appendix).The values in the table have the same meaning as in Table 3.The results are similar to those of SGUS; most of the items are statistically significant (p < 0.05) and have a low positive correlation.This implies that most of the items are independent.
In the study of QUIS, each of the items investigated different aspects of the user experience.For the analysis, the overall average from all items was calculated.to be rather easy, and the overall enthusiasm towards the system seemed (QS1, QS5) to be very positive.
The characters on the screen were relatively easy to read (QS9).The means of QS3, 4, 6, 7, and 8 indicate that the system was experienced as rigid, unreliable, and slow, which may cause frustration [17].
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q User Interaction Satisfaction (All) Tasks can be performed in a straightforward manner q q q q q q q q q q q q q q q Figure 10.Plot of QUIS score for each item.

Analysis of variance and Interaction plots
The participants are described by seven factors: gender, age, role, education skill level, computer knowledge level, internet knowledge level, and organization.Each factor is divided by two levels, except for organizations, which are in three levels.Please note that none of the participants claimed that they have a poor or very poor internet knowledge level.The following section discusses the analysis of variance (ANOVA) of QUIS and of SGUS.In this ANOVA study, SGUS and QUIS scores were investigated for using the application on the AR glasses with six independent variables, i.e., the relationships between: age distribution, gender, roles, highest level of education, organization, and computer knowledge.Therefore, there are 6 main effects and 57 interactions.We are interested in whether there is a relationship between the satisfaction levels (measured by the questionnaire) and these factors.

ANOVA of SGUS
In this study, we investigated whether the age, gender, roles, computer knowledge level, or different organizations have an effect on the satisfaction of using AR glasses.To determine this, we needed to look at the simple main effects: the main effect of one independent variable (e.g., age) at each level of another independent variable (e.g., for students and for experts).
Figure 11 shows the main effects of the six factors.Participants with different computer knowledge levels have the greatest differences in the SGUS results.This means that the participants with good computer knowledge and poor computer knowledge gave different scores for user satisfaction.The results show that participants with good or very good computer knowledge were, in general, more satisfied with the smart glasses application, and there is a significant effect from computer knowledge levels (F value = 8.87, p = 0.003).The result implies that the SGUS score was affected by the effects of good computer knowledge.
Table 4 shows the summary results of the linear model of the independent variables.The estimate for the model intercept is 54.688 and the coefficient measuring the slope of the relationship with computer knowledge level is 4.324.There is strong evidence that the significance of the model coefficient is significantly different from zero: as the computer skill level increases, so does the satisfaction.The information about the standard errors of these estimates is also provided in the   Figure 14 shows that in all three organizations, participants with moderate or worse computer levels were given lower scores than participants with good and very good computer levels.There are no significant interactions between them.
We selected the factors of organization and computer knowledge level to investigate the interaction between them, and the summary results of the linear model regression (see Table 7) shows that the estimate for the model intercept is 73.533, while there is no significant interaction between them.The information about the standard errors of these estimates is also provided in the coefficients table (Table 7).From the result of the multiple regression model, 10.6% of the variance in QUIS scores is explained by each of the factors (Multiple R-squared is 0.106).There is a statistically significant factor to explain the variation in the QUIS scores (overall p value is 0.0133).Signif.codes: *p < 0.05 ; **p < 0.01; ***p < 0.001.

Discussion
This study established a set of norms to be used for the evaluation of satisfaction of using AR glasses and AR applications.The relationship between each questionnaire item shows weak correlation, both in SGUS and in QUIS.Each questionnaire item is designed for evaluating a specific aspect of satisfaction of the smart glasses and AR applications.From the mean score of both questionnaires, we observe that most of the participants are satisfied with the AR glasses and the AR applications.It was found that the system and content helped the participants to accomplish the task quite well and their attention was captivated in a positive way.In other words, the result shows that the user interface is well designed.The user sees "useful information" displayed next to each part.
The main factors age, gender, education level, roles of the participants, and organizations do not have significant effects on the satisfaction of using smart glasses and AR applications.However, computer/internet knowledge level does influence user satisfaction.Participants who have better computer/internet knowledge are more satisfied with the smart glasses and AR applications.There is no significant interaction between all these factors.Since most participants have a moderate level or better than moderate level of knowledge using computers and the internet, it can be predicted that most educated people can easily accept smart glasses and AR applications.

Conclusions
This study was started by noting the scarcity of AR applications for hands-on training.As a first step toward incorporating the recorded teaching activities into learning procedures, the AR application was developed on AR glasses.In this work, the Questionnaire for Smart Glasses User Satisfaction (SGUS) and Questionnaire for User Interaction Satisfaction (QUIS) were investigated for augmented reality applications using Microsoft HoloLens.
The results of this study show that the approach is feasible.The experts wore the AR glasses to show the process, and the activities were recorded.The AR applications can facilitate the students to learn the process.The results show that the satisfaction of both teaching and learning are acceptable.
The results indicate that satisfaction does increase when participants have higher computer knowledge levels.It also shows that gender, age, education level, and roles of students or experts do not have any effect on user satisfaction.

Figure 1 .
Figure 1.With Microsoft HoloLens, a user connects the wires with remote assist [9].

Figure 2 .Figure 3 .
Figure 2. User Interface of the recording mode.Image from the WEKIT consortium in 2017.

Figure 5 .
Figure 5. Example of user interface of WEKIT Player mode.Image from [15].
In order to evaluate the satisfaction of the user's interaction and the smart glasses user experience, the WEKIT application was designed for three different use cases: aviation, medical imaging, and space.In our trial experiments, the test population was divided into two main groups: experts and students.A total of 47 experts (8 females; 39 males) with a high level of technological competency in Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 25 October 2018 doi:10.20944/preprints201810.0594.v1

Figure 6 .
Figure 6.Maintenance Engineer in the cockpit of a Beechcraft B200 King Air model.Image photographed by Mikhail Fominykh in 2017.

Figure 8 .
Figure 8. Astronaut trainer in a replica-training module of the international space station.Image from the WEKIT consortium in 2017.
were Microsoft HoloLens.SGUS measures subjective satisfaction on the basis of different features associated with user satisfaction, such as the content and interaction with the content.SGUS is based Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 25 October 2018 doi:10.20944/preprints201810.0594.v1

5. 2 . 1 .
Correlation of SGUS Spearman s correlation coefficient, ρ, measures the strength and direction of association between two ranked variables in the range [-1, 1].Based on the 11 items, the results of Spearman s rank correlation are shown in Table 3: the first value of each row represents Spearman's correlation coefficient, and the second value of each row represents the p value.It can be seen that almost all items are statistically significant (p < 0.05) and have a low positive correlation.This implies that all the items are independent.
(www.preprints.org)| NOT PEER-REVIEWED | Posted: 25 October 2018 doi:10.20944/preprints201810.0594.v1 Figure 10 shows the plot of the average score from individual items, and the description of the plot is the same as that of the SGUS plot.The results imply that most of the participants agree that learning to operate the AR glasses (QS13) seemed Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 25 October 2018 doi:10.20944/preprints201810.0594.v1

Figure 12 .
Interaction effects plots for SGUS: (a) Different computer knowledge levels with different genders of the participants; (b) Different computer knowledge levels with different organizations of the participants; (c) Different computer knowledge levels with different roles of the participants; (d) Different computer knowledge levels with different age groups of the participants; (e) Different computer knowledge levels with different education levels of the participants.Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 25 October 2018 doi:10.20944/preprints201810.0594.v1

Table 1 .
Descriptive statistics of the Questionnaire for Smart Glasses User Satisfaction (SGUS).

www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 October 2018 doi:10.20944/preprints201810.0594.v1 5
.1.2.QUIS Similarly, the overall Questionnaire for User Interface Satisfaction (QUIS) score was calculated by summation of the score for the 15 QUIS items.Summary data for all questions in QUIS are presented in Table2.The 15 items were designed independently from each other.These items aim to investigate the satisfaction of users with different aspects of the interface, including usability and user experience in using AR applications.

Table 2 .
Descriptive statistic of the Questionnaire for Smart Glasses User Satisfaction (QUIS).

Table 3 .
Spearman's rank coefficient of correlation for SGUS: the first value of each row represents Spearman's correlation coefficient, and the second value of each row represents the p value.

Table 6 .
ANOVA results for QUIS with regard to organization, role, and computer knowledge level (reducing factors).

Table 7 .
Summary results of the linear model of the independent variables for QUIS.

Table A1 .
Spearman's rank coefficient of correlation of QUIS: the first value of each row represents Spearman's correlation coefficient, and the second value of each row represents the p value.