Alzheimer disease (AD) is the most common form of dementia in elderly people and affects an estimated 5.2 million indi- viduals in the United States. It is estimated that 13% of people aged 65 years and older are diagnosed with AD, and the incidence and prev- alence increase considerably with age.1 With the aging of the population, physical therapists in geriatrics will be treating an increasing number of people with AD. Given the need to measure outcomes to assess progress or decline in function, specific clinical tools should be tested for reliability and validity with indi- viduals with AD.
There are recent publications supporting the physical and functional benefits of exercise in the management of AD. Identification of appropriate and useful outcome mea- sures for people with AD would enhance the ability to assess the effectiveness of interventions in clinical and research environments. Our current understanding of the psychometric properties of specific clinical tests with this population is limited. Methodological studies assessing the reliability of clinical tools for people with AD or dementia are scarce, but not nonexistent. Given the extremely limited research available exclusively with people with a diagno- sis of AD, information gleaned from research with individuals with other types of dementia was included in our review of the literature. Mixed results from studies make it difficult to know which outcome measures will best serve physical therapists’ needs in monitoring change in performance in individuals with AD. Outcome measures that have been studied for reliability with individuals with AD or dementia include: the Timed “Up & Go” Test (TUG) the Six-Minute Walk Test (6MWT) and gait speed.
Reliability measurements indicate the degree to which scores of a clinical test are free from measurement errors,11 and although conceptually straightforward, the application of this notion can be complex.Reliability can be expressed as relative reliability or as absolute reliability. If a measurement has high relative reliability, this indicates that repeated measurements will reveal consistent positioning or ranking of individuals’ scores within a group.If a measurement has high absolute reliability, this indicates that, upon repeated measurement, scores show little variability. Relative reliability is measured with correlation coefficients. The intraclass correlation coefficient (ICC) evaluates correlation based upon variance estimates from analysis of variance; the more common the variance between sets of measurements, the higher the ICC. The ICC is an appropriate statistic for examining test-retest reliability. As a general guideline, an ICC above .75 is considered to demonstrate good reliability; for clinical measures, it is suggested that reliability should exceed .90 to ensure reasonable validity.
Excellent test-retest reliability does not necessarily ensure that individuals’ repeated performance will be consistent from test to test. Scores may vary, given expected variability of individual performance and measurement error. A measure of absolute variability provides useful information to delineate the “expected” changes from “true” changes in performance. Statistically, absolute reliability is determined by the standard error of measurement (SEM), or the standard deviation of the measurement errors, and a clinically useful mechanism for looking at absolute reliability is the minimal detectable change (MDC) score.
Recent literature presenting TUG and gait speed data for individuals with dementia highlights the importance of understanding relative versus absolute reliability. Even though test-retest reliability coefficients for clinical tests are high, individual variability and measurement error make it very difficult to identify a “true” change in performance over time. Minimal detectable change scores provide researchers and clinicians with the opportunity to determine whether a change in performance is a meaningful change (ie, beyond expected measurement error and individual variability).
Clinical observation in people with AD reveals increasing variability of performance with increasing levels of dementia. The existing literature supports this observation. Although Thomas and Hageman found the TUG to have reasonable test-retest reliability in subjects in day care settings who were considered to have mild to moderate dementia (Mini-Mental Status Examination [MMSE] [SD]