6.1. Test scenario
We assume that SlidAR was faster mainly due to very specific target positions. Even though the accurate initial positioning took some effort, the position adjustment was quick and accurate because only 1 DOF was controlled. There was no need to constantly change the viewpoint and the adjustment was not affected by the unintentional movement of the device. The initial positioning with HoldAR was fast, but the position adjustment was time consuming because 3 DOF were controlled. This made the adjustment vulnerable to unintentional movement and perceptual errors.
A direct tap gesture is very intuitive for initial positioning, but it has problems regarding the ambiguity caused by user's fingers blocking the screen and the shakiness of the handheld devices. issue in SlidAR if target positions in the real world are very small in which case initial positioning has to be very precise. The initial positioning with a tap gesture could be improved with view freezing [27] or with a combination of view freezing and Shift [38].
Unlike SlidAR, HoldAR does not require a precise initial positioning because target position does not need to be on the ray cast from the camera. However, according to participants' comments the use of HoldAR requires more mental effort if they have to determine the initial position based on how effectively they can translate the virtual object from the initial position to the target position.
The perceptual issues [2,39] can have a considerable effect on positioning accuracy when target positions are real instead of virtual. The combined average error rate in all conditions (M¼12.8 mm, SD¼1.3 mm) can be due to the issues in perception and the participants' judgment of the sufficient level of accuracy. A small positioning error can be very difficult to detect if the position is not checked from several viewpoints and at a close distance. Furthermore, the low resolution (480 640 pixels) of the video output in our implementation and the 2D representation of virtual objects can affect the accuracy in both methods. The large amount of variation (Fig. 6(b)) in the positioning errors of SlidAR can be explained with the threshold of adjusting the objects position away from the epipolar line. Because an arbitrary adjustment with SlidAR was impossible, the virtual object had to be first moved with the cut & paste function and then adjusted again along the new epipolar line. Some participants may have settled with a certain level of accuracy due to the required effort in repositioning, even if they were aware that the position was not accurate enough.
The overall and normalized device movement needed was significantly higher because the position had to be adjusted and confirmed several times with HoldAR. The movement required while using SlidAR was more consistent. Furthermore, the adjustment was done with a finger gesture without the need to move the handheld device. The significant difference in movement between the easy and hard task with HoldAR can be associated with perceptual issues in understanding depth cues. The viewpoint had to be changed if the position of the object and it's shadow was unclear. We did not find significant differences between easy and difficult tasks. As such, based on our observations, the efficiency of SlidAR was not dependent on the environment's complexity.
The subjective results from HARUS strongly correlate to the results from the objective measurements. Completing the tasks with SlidAR took less effort in terms of time and movement, which is reflected to overall manipulability scores. The comprehensibility scores were also significant, but this was mainly due to S9 and S12, which are related to the difficulties in controlling and perceiving the position accurately. The remaining comprehensibility statements were expectedly not significantly different, because both positioning methods were implemented to the same HAR system and their user interfaces were very similar.
Although the experiment results only support H1 and H3 but not H2, we argue that the SlidAR was more efficient in our test scenario. It can achieve the same level of accuracy with significantly less time and less effort compared to the HoldAR method. The H4 was supported only partially, but it shows that the environment can affect those methods that require virtual depth cues to be displayed in the environment.