Unit Testing
What is a unit?
Some standards—particularly ANSI/IEEE Std 1008-1987 (IEEE Standard for Software Unit Testing)—use a lax definition of software unit. According to IEEE-Std-1008 a software unit "...may occur at any level of the design hierarchy from a single module to a complete program". I think this is because at the time the standard was written most testing was manual, in which case there isn't much difference between planning/managing the test process for a single class and for a complete program. From an economic point of view though, there is a big difference: testing of each individual component was often considered too expensive and testing frequently skipped this step. Some approaches like Cleanroom were (partly) born as a reaction to the excessive cost of proper unit testing in particular in the context of incremental development life cycles. The introduction of automatic regression tests at the unit level and standard test harnesses has changed this balance.
More modern definitions (like the comp.software.testing FAQ and accepted industry best practice define unit tests in a much more restrictive way:
Unit. The smallest compilable component. A unit typically is the work of one programmer (At least in principle). As defined, it does not include any called sub-components (for procedural languages) or communicating components in general.
Unit Testing. In unit testing called components (or communicating components) are replaced with stubs, simulators, or trusted components. Calling components are replaced with drivers or trusted super-components. The unit is tested in isolation.
For object-oriented programs this means that the unit is usually a class. Some classes (like a simple Stack) might be self-contained, but most call other classes and are in turn called by yet other classes. In an attempt to reduce confusion when things go wrong, you should try to test each class in isolation. If you don't test them in isolation, you are implicitly trusting all classes used by the class you are testing. You are effectivelly saying: I think all the other classes already work and if they don't, I'm prepared to sort out the mess myself. That's what "trusted" means in the above definition. If you don't think the other classes work, you should test in isolation. This is normally more work as writing stubs and drivers is a pain.
When to test?
As the scope of unit testing narrows down from complete programs to individual classes, so does the meaning of integration testing. Any time you test two or more already unit-tested classes together instead of using stubs, you are doing a litle bit of integration testing.
For some systems integration testing is a big issue, because they wait to finish coding before they start unit testing. This is a big mistake, as delaying unit testing means you will be doing it under schedule pressure, making it all-too-easy to drop the tests and just finish the code. Developers should expect to spend between 25% and 50% percent of their time writing unit tests. If they leave testing until they have finished, they can expect to spend the same amount testing as they spend writing the module in the first place. This is going to be extremely painful for them. The idea is to spread the cost of unit testing over the whole implementation phase. This is sometimes called "incremental glass-box testing" (see Marc Rettig's article).
If you wait until you've finished coding before you start unit testing, you'll have to choose an integration strategy. Are you going to start with low level classes first and work your way up until you reach the classes that expose some functionality through an public API or start from the top and write stubs for lower level classes or will you just test them all in one go?
Code Coverage
The greatest doubt I had when writing the standard, was not only how much coverage to mandate but also whether to mandate any coverage at all. It is easy enough to come up with a figure: 85% seems to be pretty standard. But I have to agree with Brian Marick [BM] that there is no evidence supporting this number. In my opinion 100% is reasonable, as anything less means you haven't tested that particular statements at all. Of course it is difficult to have automatic unit tests for certain parts of the code. Hardware interaction and UI code are typical examples. Or panics. If you have acceptance tests that include tests for these parts of the code or review them thoroughly, and if you make a sincere effort to minimise the size and complexity of these parts of the code, you can normally get away with not unit testing them. But I'd rather include any known exceptions to the coverage rule in the standard itself instead of arbitrarily lowering the bar for all the rest of the code.
Three pitfalls to consider if you mandate coverage:
Don't consider tests that don't increase coverage as redundant. This is a big mistake. They might not add coverage, but they might find bugs. C