We have described a fairly complete process for upgrading the OS of a computer, yet we have not mentioned a particular vendor’s OS, particular commands to type, or buttons to click. The important parts of the process are not the technology, which is a matter of reading manuals, but rather communication, attention to detail, and testing.
The basic tool we used is a checklist. We began by developing the checklist, which we then used to determine which services required upgrading, how long the upgrade would take, and when we could do it. The checklist drives what tests we develop, and those tests are used over and over again. We use the tests before and after the upgrade to ensure quality. If the upgrade fails, we activate the back-out plans included in the checklist. When the process is complete, we announce this to the list of concerned customers on the checklist.
A checklist is a simple tool. It is a single place where all the information is maintained. Whether you use paper, a spreadsheet, or a web page, the checklist is the focal point. It keeps the team on the same page, figuratively speaking, keeps the individuals focused, lets the customers understand the process, helps management understand the status, and brings new team members up to speed quickly.
Like many SA processes, this requires communication skills. Negotiation is a communication process, and we use it to determine when the upgrade will happen, what needs to happen, and what the priorities are if things go wrong.We give the customers a feeling of closure by communicating to them when we are finished. This helps the customer/SA relationship. We cannot stress enough the importance of putting the checklist on a web page. The more eyes that can review the information, the better.
When the tests are automated, we can repeat them with accuracy and ensure completeness. These tests should be general enough that they can be reused not only for future upgrades on the same host but also on other similar hosts. In fact, the tests should be integrated into your real-time monitoring system. Why perform these tests only after upgrades?
This simple process can be easily understood and practiced. This is one of the basic processes that an SA must master before moving on to more complicated upgrades. The real-world examples we used all required some kind of deviation from the basic process yet still encompassed the essential points.
Some OS distributions make upgrading almost risk-free and painless, and some are much more risky. Although there are no guarantees, it is much better when an operating system has a way to do upgrades reliably, repeatably, and with the ability to easily revert. The minimum number of commands or mouse clicks reduces the possibility of human error. Being able to upgrade many machines in a repeatable way has many benefits; especially important is that it helps maintain consistent systems. Any ability to revert to a previous state gives a level of undo that is like an insurance policy: You hope you never need it but are glad it exists when you do.