Figure 2 outlines the Gestonurse system architecture. The streaming depth maps captured through the Kinect sensor are processed by the gesturerecognition module while a microphone concurrently captures voice commands interpreted by the speechrecognition module.