We consider situations in which aggregated data for N administrative regions or cells within the study area are observed. The available data consist of (yi,Ei,xi)Ni=1, where yi is the number of cases of disease in cell i, Ei is the expected number of cases of disease in cell i in the absence of clustering and xi = (x1i, x2i) is the geographic centroid of cell i. The expected number of cases, Ei, may reflect the overall disease rate applied to the regional population at risk or may be fitted values from a non-spatial Poisson regression model incorporating the effects of individual and/or regional covariates. We assume that yi are independent Poisson random variables with mean ρiEi.
For any subset Z of the study region, we consider the following model for ρi: log(ρi) = αZ + θZ δZ(xi), where δZ(xi) = 1 if xi ∈ Z and δZ(xi) = 0 otherwise, αZ is the log disease risk for locations outside Z and θZ is the log relative risk for locations inside Z. If Z is not a cluster, θZ = 0; if Z is a cluster, θZ ≠ 0. We consider the two-sided alternative, clusters 6 with elevated or reduced risk, rather than the one-sided alternative, clusters with elevated risk only. The issues discussed here arise regardless of the choice of alternative.
The evidence for Z as a cluster is given by the log likelihood ratio test statistic for H0: θZ = 0 versus HA: θZ ≠ 0,