Too bad computers aren't more like people. When we work harder, our hearts beat faster. When we're hot, we sweat. But in the 54 years since British mathematician Alan Turing introduced the notion of artificial intelligence, computer scientists haven't delivered anything close to a self-aware and self-healing computer.
That may change soon enough. Researchers in business and government labs are building systems that will challenge what it means to be an IT worker by automating many of the monitoring and maintenance tasks done today by hand. Scientists in labs from IBM to the Department of Defense are developing adaptive systems able to manage, heal, defend, configure, and optimize themselves without human intervention, just as our bodies can combat an infection without conscious effort. While that ultimate goal is still years away, the first generation of business-ready, self-adapting software tools is within reach.
Biology is more than an analogy here. Researchers have turned to the natural world for inspiration in developing adaptive technologies flexible enough to cope with the increasingly complex computer and business systems that drive our world. Whether through cell technology or the social interactions of ants, biology provides ideas for researchers who want computer users to be able to focus on business goals without having to tell a server the optimum way to do so. It's similar to how a person focuses on a goal--say, climbing a mountain--without consciously telling the body everything it must do to get there.
The motivating factor behind it all: to wage war on complexity. The interlocking pieces of software that make up business computer networks will soon be beyond the comprehension of most IT workers. Plus, these complex systems tend to be fragile, breaking down when even minor changes are made. The complexity results in cost overruns, implementation delays, staffing problems, productivity losses, and missed business opportunities. "As we talked to our customers, we kept hearing the same refrain," says Greg Burke, director of IBM's eLiza project. "Technology is too complex and IT departments are having trouble keeping up with the maintenance requirements in their multivendor, multiarchitecture environments."
The eLiza project team at IBM is developing hardware, software, and networks that will be able to allocate computing resources as needed, safeguard data, and ensure business continuity in case of a disaster. It's already brought some of that effort to the market, such as the eLiza E-business-management services that match IT resource availability with business requirements to make sure business-performance levels are met.
One major challenge to implementing self-managing technology will be prying the fingers of managers from the controls of their IT systems. As with any new automated technology, the transition will be difficult, as the IT equivalent of John Henry tries to prove he can still optimize a network better than a computer can. George Vrabel, a Jacksonville, Fla., senior audit director for Bank of America Corp., is mindful of the fact that fixing a bug often introduces new errors. "Self-healing and-configuring systems will be good--as long as they don't create other problems," he says.
The majority of the auditing work to make sure computer architectures and IT security meet bank policies is done manually today, Vrabel says. He'd be comfortable using a proven system, he says, though he would still want to be informed when something breaks, even if the system can fix itself.
Vendors are starting to deliver some early examples of self-healing software features to the marketplace, such as Windows XP's ability to automatically grab software updates over the Internet, notes Charles Nettles, chief technology executive at McKesson Corp., a pharmaceutical and medical-supply company in San Francisco. Nettles expects such tools to greatly boost worker productivity by reducing downtime, but his experience has taught him to keep McKesson from being too early an adopter. "We all know that vendors are capable of making extraordinary claims," he says. "How many times have we heard a vendor say its products were fully integrated only to find they weren't?"
Creating software that's aware of its own behavior and that of interacting components has been a largely unrealized dream in the industry. The most successful attempts are in the area of hardware configuration, such as servers that automatically switch among redundant disks to prevent a total system shutdown. The computer switches that run telecommunications networks also use advanced routing algorithms to move traffic around system outages.
Microsoft has had mixed results trying to make its systems easier to administer. In 1997, it brought out its Zero Administration initiative for automating the administration of client PCs running Windows. It included automatic updates of the client operating system and applications, centralized administration and system lockdown, and persistent caching of data and configuration information. But IT managers complained the utilities often caused software incompatibilities and system failures, says Craig Mundie, Microsoft's chief technology officer. "Going forward, we don't want to force automated configuration on IT managers," he says. "Now, we send them notice of upgrades and let them decide how and when to implement them." Microsoft has been somewhat more successful with self-healing features in its SQL Server database, introduced in 1998. The software that makes it possible is based on control theory, a discipline used by electrical engineers for years and applied to software design by the Oregon Graduate Institute, says David Campbell, a SQL Server architect. Microsoft adapted that research work, along with index-tuning capabilities developed by Microsoft Research, to give SQL Server the ability to adjust to the shifting demands of a live database environment. For example, SQL Server 7 and SQL Server 2000 can allocate memory automatically, where and when it's needed, to address things such as input/output demands or the size of a buffer pool. The self-adjusting capabilities are widely used by SQL Server customers, Campbell says.
Adoption of self-managing systems will play out in three phases, Mundie predicts. First, there will be automatic online updates such as what Windows XP does. Next will come policy-based IT systems, created by adding software to existing hardware and operating systems, which do automatic configurations and software deployments, based on corporate IT policies. These two phases will happen in the next three years, Mundie says. But it will take 10 to 20 years--and require new software, operating systems, and architectures--for true self-healing systems to be adopted by business. The struggle to create those systems is playing out now in government and private labs.
The technology behind IBM's eLiza project is being developed by a team of computer scientists, physicists, and mathematicians at IBM Research. During the past decade, the team has developed learning algorithms that can detect whether a computer system is healthy or in decline. The algorithms can automatically sense when resources need to be reallocated and make the appropriate fixes without human intervention. The technology has been used in the MVS mainframe operating system since 1994--optimizing the thousands of processes, CPU configurations, and other features of the system that became too complex for system administrators to handle manually. Now, IBM's challenge is to deploy the technology in the distributed world.
Enter IBM's Heterogeneous Workload Management software, which uses self-teaching algorithms to automate much of the system configuration work, such as allocating CPU capacity, now done by IT workers. The software takes a snapshot of the system every 10 seconds to monitor changes in performance. It's being tested by IBM customers in the insurance and financial-services fields, and the first iteration will be integrated into IBM server operating systems later this year. It should provide much-needed help: A server can be configured in about 500 ways, so an administrator may take days or weeks to figure out a good configuration for optimizing the company's environment.