6. Concluding remarks
What must be stressed in regard to this analysis is that this is a first estimate of
a geocoding hit rate. Common sense dictates that we should attempt to achieve a
hit rate of 100% every time. It must not be forgotten that even if an 85% hit rate is
achieved, more than 1 in 10 addresses in a crime table are not being geocoded. This
means that if a police analyst wanted to map 10 000 crime sites, up to 1500 are not
represented in the final map. That is not an insignificant number.
The sensible approach for an analyst is to examine the ungeocoded records and
determine if any pattern can be discerned from the geocoding ‘misses’. These
regular misses may be concentrated in one area, or may be easily resolved using an
address scrubbing routine. Address scrubbers work by providing a first pass over a
spatial database prior to geocoding. This first pass is designed to correct common
spelling mistakes, remove unwanted textual complications, and prepare the address
base for maximum geocoding efficiency. Common examples of address scrubbing
operations including changing ‘Gdns’ to ‘Gardens’, removing unit numbers or
apartment numbers, and replacing landmarks with their actual addresses. After the
address file has been run through the scrubber, increases in geocoding efficiency and
accuracy are usually seen. Improvements are usually ongoing if the analyst always
examines the geocoding misses to determine the cause of the problem. This ongoing
70 J. H. Ratcliffe
process of continual improvement is one of the easiest ways to increase geocoding
efficiency.
Effort should be measured against reward. Expending significant effort to
increase geocoding of a theft from motor vehicle database is unlikely to be
worthwhile if the resultant analysis will not be acted upon and law enforcement
priorities are elsewhere. Most people would not struggle to determine the
appropriate policy objectives between geocoding a theft from motor vehicle
database or a serial homicide database. There are lessons to be learned from both
databases however, and an understanding of the error characteristics of the former
may assist with geocoding of the latter.
An e-mail enquiry distributed on the list server of the Crime Mapping Research
Center (now the Mapping and Analysis for Public Safety program) of the US
NIJ indicated that in general, law enforcement geocoding hit rates were in the
acceptable range. Nearly forty individuals described their geocoding experiences
with numerous different agencies. The mean average geocoding hit rate was 87.5%,
with a standard deviation of 14.1%. The lowest was 41%, while the highest was
99.7%. Slightly more than two thirds of the responses were 90% or greater.
And if an 85% hit rate cannot be achieved? While this study does not suggest
that maps created with data that are geocoded at a lower hit rate are necessarily
showing an incorrect distribution or significantly lower quantity of points, it does
follow that the lower the hit rate the greater the potential for error in spatial
patterns, and there certainly exists the potential to underestimate the magnitude of
any problem. It is suggested here that that this first estimate of an empirically
derived minimum acceptable hit rate should be used as a minimum standard.