The Predator UAV, developed in the 1990s, went from concept to deployment in less than 30 months, which is extremely fast by military procurement standards.
Little wonder, then, that the UAV exhibited quite a few kinks upon entering the field. Among other things, it often failed when flying in bad weather, it was troublesome to operate and maintain, and its infrared and daylight cameras had great difficulty discerning targets. But because commanders needed the drone quickly, they were willing to accept these imperfections, with the expectation that future upgrades would iron out the kinks. They didn't have time to wait until the drone had been thoroughly field-tested.
But how do you test a fully autonomous system? With robots that are remotely operated or that navigate via GPS waypoints, the vehicle's actions are known in advance. Should it deviate from its instructions, a human operator can issue an emergency shutdown command.
However, if the vehicle is making its own decisions, its behavior can't be predicted. Nor will it always be clear whether the machine is behaving appropriately and safely. Countless factors can affect the outcome of a given test: the robot's cognitive information processing, external stimuli, variations in the operational environment, hardware and software failures, false stimuli, and any new and unexpected situation a robot might encounter. New testing methods are therefore needed that provide insight and introspection into why a robot makes the decisions it makes.
Gaining such insight into a machine is akin to performing a functional MRI on a human brain. By watching which areas of the brain experience greater blood flow and neuronal activity in certain situations, neuroscientists gain a better understanding of how the brain operates. For a robot, the equivalent would be to conduct software simulations to tap the "brain" of the machine. Subjecting the robot to certain conditions, we could then watch what kinds of data its sensors collect, how it processes and analyzes those data, and how it uses the data to arrive at a decision.
Another illuminating form of testing that is often skipped in the rush to deploy today's military robots involves simply playing with the machines on an experimental "playground." The playground has well-defined boundaries and safety constraints that allow humans as well as other robots to interact with the test robot and observe its behavior. Here, it's less important to know the details of the sensor data and the exact sequence of decisions that the machine is making; what emerges on the playground is whether or not the robot's behavior is acceptably safe and appropriate.
Moving to smarter and more autonomous systems will place an even greater burden on human evaluators and their ability to parse the outcomes of all this testing. But they'll never be able to assess all possible outcomes, because this would involve an infinite number of possibilities. Clearly, we need a new way of testing autonomous systems that is statistically meaningful and also inspires confidence in the results. And of course, for us to feel confident that we understand the machine's behavior and trust its decision making, such tests will need to be completed before the autonomous robot is deployed.