Introduction
Source code is a form of expression and so quality-of-expression must be a primary concern. Poor
quality-of-expression creates ongoing costs and impacts the productivity (hence the reputation) of the
team that produces it. Therefore software authors must learn to clearly express the intention of
variables, methods, classes, and modules.
Furthermore, maintenance errors occur when a programmer does not correctly understand the code he
must modify. The diligent programmer may expend considerable effort to accurately decipher the true
intent of a program, even though its purpose was obvious to the original author. Indeed, it takes much
more time to determine how to change the code than to make and test the change.
Some efforts to improve maintainability of code have actually exacerbated the problem by making
names less intuitive, less obvious, and less natural. They make it easier to glean the information that
was not lost (like type information and scope, still present in variable declarations), without making it
easier to discover the more important information that was lost.
An expressive name for a software object must be clear, precise, and small. This is a guide to better
clarity, precision, and terseness in naming. It is the author's hope that you will find it useful.
Use Intention-revealing Names
We often see comments used where better naming would be appropriate:
int d; // elapsed time in days
Such a comment is an excuse for not using a better variable name. The name 'd' doesn't evoke a sense
of time, nor does the comment describe what time interval it represents. It requires a change:
int elapsedTimeInDays;
int daysSinceCreation;
int daysSinceModification;
int fileAgeInDays;
The simple act of using a better name instead of applying a better comment can reduce the difficulty of
working with the code we write.
What is the purpose of this python code?
list1 = []
for x in theList:
if x[0] == 4:
list1 += x;
return list1
Why is it hard to tell what this code is doing? Clearly there are no complex expressions. Spacing and
indentation are reasonable. There are only three variables and two constants mentioned at all. There
aren't even any fancy classes or overloaded operators, just a list of lists (or so it seems).
The problem isn't the simplicity of the code but the implicity of the code: the degree to which the
context is not explicit in the code itself. The code requires me to answer questions such as:
● What kinds of things are in theList?
● What is the significance of the zeroeth subscript of an item in theList?
● What is the significance of the value 4?
● How would I use the list being returned?
This information is not present in the code sample, but it could have been. Say that we're working in a
mine sweeper game. We find that the board is a list of cells called theList. Let's rename that to
theBoard.
Each cell on the board is represented by a simple array. We further find that the zeroeth subscript is the
location of a status value, and that a status value of 4 means 'flagged'. Just by giving these concepts
names we can improve the code considerably:
flaggedCells = []
for cell in theBoard:
if cell[STATUS_VALUE] == FLAGGED:
flaggedCells += cell
return flaggedCells
Notice that the simplicity of the code is not changed. It still has exactly the same number of operators
and constants, with exactly the same number of nesting levels.
We can go further and write a simple class for cells instead of using an array of ints. The expression
used to check that a cell has been flagged can be renamed by adding an intention-revealing function
(call it isFlagged) to hide the magic numbers. It results in a new version of the function:
flaggedCells = []
for cell in theBoard:
if cell.isFlagged():
flaggedCells += cell
return flaggedCells
or more tersely:
return [ cell for cell in theBoard if cell.isFlagged() ]
Even with the function collapsed to a list comprehension, it's not difficult to understand. My original
four questions are answered fully, and the implicity of the code is reduced. This is the power of
naming.
999999999999999Avoid Disinformation
A software author must avoid leaving false clues which obscure the meaning of code.
We should avoid words whose entrenched meanings vary from our intended meaning. For example,
"hp", "aix", and "sco" would be poor variable names because they are the names of Unix platforms
or variants. Even if you are coding a hypotenuse and "hp" looks like a good abbreviation, it is
disinformative.
Do not refer to a grouping of accounts as an AccountList unless it's actually a list. The word
list means something specific to CS people. If the container holding the accounts is not actually a
list, it may lead to false conclusions. AccountGroup or BunchOfAccounts would have been
better.
Beware of using names which vary in small ways. How long does it take to spot the subtle difference
between a XYZControllerForEfficientHandlingOfStrings in one module and,
somewhere a little more distant XYZControllerForEfficientStorageOfStrings? The
words have frightfully similar shape.
With modern Java environments, you have automatic code completion. You will write a few characters
of a name and press some hot key combination (if that) and you will be greeted with a list of possible
completions for that name. It is nice if names for very similar things sort together alphabetically, and if
the differences are very, very obvious since the developer is likely to pick an object by name without
seeing your copious comments or even the list of methods supplied by that class.
A truly awful example of dis-informative names would be the use of lower-case L or uppercase o as
variable names, especially in combination. The problem, of course is that they look almost entirely like
the constants one and zero (respectively).
int a = l;
if ( O = l )
a = O1;
else
l = 01;
The reader may think this a contrivance, but the author has examined code where such things were
abundant. It's a great technique for shrouding code. The author of the code suggested using a different
font so that the differences were more obvious, a solution that would have to be passed down to all
future developers as oral tradition or in a written document. The problem is conquered with finality and
without creating new work products if an author performs a simple renaming.
Make Meaningful Distinctions
A problem arises from writing code solely to satisfy a compiler or interpreter. One can't have the same
name referring to two things in the same scope, so one name is changed them in an arbitrary way.
Sometimes this is done by misspelling one, leading to the surprising situation where correcting spelling
errors leads to an inability to compile.
It is not sufficient to add number series or noise words, even though the compiler is satisfied. If names
must be different, then they should also mean something different.
Number-series naming (a1, a2, .. aN) is the opposite of intentional naming. Without being
disinformative, number series names provide no clue to the intention of the author. Naming by
intention and by domain may lend one to use names like lvalue and rvalue or source and
destination rather than string1 and string2.
Noise words are another meaningless distinction. Imagine that you have a Product class. If you have
another called ProductInfo or ProductData, you have made the names different without making
them mean anything different. Info and Data are indistinct noise words like "a", "an" and "the".
Noise words are redundant. The word variable should never appear in a variable name. The word
table should never appear in a table name. How is NameString better than Name? Would a Name
ever be a floating point number? If so, it breaks an earlier rule about disinformation. Imagine finding
one class named Customer and another named CustomerObject, what should you understand as the
distinction? Which one will represent the best path to a customer's payment history?
There is an application I know of where this is illustrated. I've changed the names to protect the guilty,
but the exact form of the error is:
getSomething();
getSomethings();
getSomethingInfo();
Consider context. In the absence of a class named Denomination, MoneyAmount is no better than
money. CustomerInfo is no better than Customer.
Disambiguate in such a way that the reader knows what the different versions offer her, instead of
merely that they're different.
Use Pronounceable Names
If you can't pronounce it, you can't discuss it without sounding like an idiot. "Well, over here on the bee
cee arr three cee enn tee we have a pee ess zee kyew int, see?" This matters because programming is a
social activity.
A company I know has genymdhms (generation date, year, month, day, hour, minute and second) so
they walked around saying "gen why emm dee aich emm ess". I have an annoying habit of pronouncing
everything as-written, so I started saying "gen-yah-mudda-hims". It later was being called this by a host
of designers and analysts, and we still sounded silly. But we were in on the joke, so it was fun. Fun or
not, we were tolerating poor naming. New developers had to have the variables explained to them, and
then they spoke about it in silly made-up words instead of using proper English terms.
class DtaRcrd102 {
private Date genymdhms;
private Date modymdhms;
private final String pszqint = "102";
/* ... */
};
class Customer {
private Date generationTimestamp;
private Date modificationTimestamp;;
private final String recordId = "102";
/* ... */
};
Intelligent conversation is now possible:"Hey, Mikey, take a look at this record! The generation
timestamp is set to tomorrow's date! How can that be?"
Use Searchable Names
Single-letter names and numeric constants have a particular problem in that they are not easy to locate
across a body of text.
One might easily grep for MAX_CLASSES_PER_STUDENT but the number 7 could be more
troublesome. Searche