National Cancer Institute   National Cancer Institute

Behavioral Research

Table of Contents
1 General Definition & Theoretical Background

Measuring Stage






Appendix A

6 Appendix B
7 Appendix C
8 Appendix D
9 Published Examples

Download Full Text (PDF)

Other Constructs



Dispositional Optimism




Illness Representations

  Implementation Intentions
  Intention, Expectation, and Willingness
  Normative Beliefs
  Optimistic Bias
  Perceived Benefits
  Perceived Control
  Perceived Severity
  Perceived Vulnerability
  Self-Reported Behavior
  Social Influence
  Social Support

Stephen Sutton

<< Previous


Measuring Stage

The Transtheoretical Model

In work on the TTM, three main methods have been used to measure stages: multi-dimensional questionnaires; single-item continuous measures of readiness to change; and staging algorithms and self-categorizations. These are discussed in turn.

Multi-dimensional questionnaires. Most studies that have investigated alcohol and drug use from the standpoint of the TTM have used multi-dimensional questionnaires to measure stage of change. In this approach, each stage is measured by a set of questionnaire items, and scores are derived for each individual representing their position on each dimension. Three such multi-dimensional questionnaires have been used in studies of alcohol and drug use: the University of Rhode Island Change Assessment (URICA; McConnaughy, Prochaska & Velicer, 1983), the Stages of Change Readiness and Treatment Eagerness Scale (SOCRATES; Miller & Tonigan, 1996) and the Readiness to Change Questionnaire (RTCQ; Rollnick, Heather, Gold & Hall, 1992). Consider the URICA as an example of this approach. The URICA was the first multi-dimensional questionnaire designed to measure stages of change, and is intended for use in clinical contexts. It consists of 32 items, eight for each of four stages (precontemplation, contemplation, action, maintenance; Appendix A). The items refer generically to the respondent’s "problem" but do not specify a particular problem behavior.

In studies using the URICA, there is a fairly consistent pattern of relatively large correlations among subscales representing adjacent stages. The correlations between contemplation and maintenance, which are non-adjacent stages, are also relatively large (Sutton, 2001). The observed pattern of correlations suggests that the URICA is not measuring discrete stages. The same applies to the SOCRATES and the RTCQ. Thus, multi-dimensional questionnaires are not consistent with the assumption of discrete stages.

Single-item continuous measures of readiness to change. Biener and Abrams (1991) developed the Contemplation Ladder as a simple measure of readiness to consider smoking cessation. This scale consists of a visual representation of a vertical ladder scaled from 0 to 10, with the scalepoints 0, 2, 5, 8 and 10 labeled as follows: "No thought of quitting"/"Think I need to consider quitting someday"/"Think I should quit but not quite ready" /"Starting to think about how to change my smoking patterns" /"Taking action to quit (e.g., cutting down, enrolling in a program)". In their sample of smokers recruited from two worksites, Biener and Abrams (1991) found a highly significant correlation of 0.64 between the ladder score and a single-item measures of intention to try to quit. The ladder and intention measures showed similar patterns of correlations with other variables. With regard to predictive validity, ladder scores were more predictive than were intention scores of participation in cotinine assessment and taking a self-test of addiction to nicotine, which Biener and Abrams interpret as earlier stages of readiness, whereas intention was a better predictor of participation in events that required a quit attempt and of successful quitting. Note that the contemplation ladder does not represent the full set of stages: it does not include the action stage (usually defined in work on the TTM as having stopped smoking for up to 6 months) or the maintenance stage (having quit for more than 6 months). The contemplation ladder has been adapted for use with other behaviors, for example marijuana use (Slavet et al., 2006) and gambling (Petry, 2005).

Similar measures were developed by LaBrie, Quinlan, Schiffman and Earleywine (2005) for alcohol and safer sex. These readiness- to-change "rulers" consist of a horizontal line scaled from 0 to 10 with five labels (e.g., for drinking: "Never think about my drinking" /"Sometimes I think about drinking less" /"I have decided to drink less" /"I am already trying to cut back on my drinking" /"My drinking has changed. I now drink less than before"). The alcohol change ruler correlated 0.77 with total score from the RTCQ (Rollnick et al., 1992). Similarly, the safer sex ruler correlated 0.77 with the total score from an 11-item Readiness to Change Risky Sexual Behavior (RTCQ-SB) based on the RTCQ. The rulers showed higher correlations with measures of intentions to change than did the scores from the corresponding multi-item scales. The authors conclude that the rulers may be useful when a quick and simple assessment of readiness to change is required.

Virtually all the studies that have used contemplation ladders or readiness rulers have treated the scores as continuous measures of readiness to change. This approach is inconsistent with the idea that change involves movement through a sequence of discrete stages. Furthermore, the use of measures of behavioral intention as criteria for assessing the validity of ladders and rulers raises the question of whether intention measures could themselves be used as a simple way of assessing readiness to change.

Staging algorithms and self-categorizations. A staging algorithm uses a small number of questionnaire items and a set of rules to allocate participants to stages in such a way that no individual can be in more than one stage. Self-categorizations are single-item measures in which participants are presented with a list of statements, each of which represents a stage, and are asked to select the one that best describes them. Both these methods are relatively brief and simple, compared to multi-dimensional questionnaires. (Staging algorithms could be highly complex but in practice they use a small number of items and a simple set of rules). If the resulting measure is analysed as a set of categories rather than as a continuous scale, this approach is in principle more consistent with the assumption of discrete stages than is either of the other two methods.

The few studies that have compared staging algorithms with multi-dimensional questionnaires have found low concordance between them (e.g., Belding, Iguchi & Lamb, 1996), suggesting that they are measuring quite different constructs. A study that compared a staging algorithm with the contemplation ladder divided into three categories (analogous to "precontempation", "contemplation" and "preparation") found a correlation of 0.58 between the two classification schemes but also some important differences (Herzog, Abrams, Emmons & Linnan, 2000). For example, of those who were classified by the algorithm as precontemplators, 49% were in the "contemplation" group as assessed by the ladder, suggesting that the two measures should not be used interchangeably.

A staging algorithm for smoking that has been used in a large number of studies since it was introduced by DiClemente and colleagues (DiClemente, Prochaska, Fairhurst, Velicer, Velasquez & Rossi, 1991) can be found in Appendix B. Precontemplation, contemplation, and preparation are defined in terms of current behavior, intentions and past behavior (whether or not the person has made a 24-hour quit attempt in the past year), whereas action and maintenance are defined purely in terms of behavior; ex-smokers’ intentions are not taken into account. Although the phrase "seriously thinking of quitting" is used in Appendix B, different versions of this algorithm have used alternative wordings including "seriously considering quitting", "intending to quit" and "planning to quit". Such apparently minor changes in wording can have a large effect on stage distributions (Sutton, 2000).

Critics have pointed out a number of problems with this algorithm, some of which stem from the way that contemplation and preparation are defined (e.g., Borland, Balmford, Segan, Livingston & Owen, 2003; Etter & Sutton, 2002). According to this algorithm, a smoker cannot be in the preparation stage unless he or she has made a recent quit attempt. Thus, a smoker can never be "prepared" for his or her first quit attempt. Similarly, the subgroup of smokers in the contemplation stage who intend to quit in the next 30 days but have not made a quit attempt in the past year cannot move to the preparation stage. Thus, the stages are defined in such a way that some smokers cannot move directly to the next stage in the sequence (Sutton, 2000).

A variant of the TTM developed by a group of researchers in the Netherlands (e.g., Dijkstra, Bakker & De Vries, 1997) uses different definitions of the stages that avoid these problems. In this model, the pre-action stages are defined purely in terms of intention: preparation is defined as planning to quit in the next month and contemplation as planning to quit in the next six months but not in the next month.

A staging algorithm for exercise developed by Marcus and Simpkin (1993) is in Appendix C. Although this algorithm does not suffer from the logical problems of the DiClemente et al. (1991) smoking algorithm, it seems somewhat implausible to treat irregular exercise (preparation) as a discrete stage between contemplation and action, implying that people move from no exercise to irregular exercise to regular exercise and that irregular exercise is qualitatively different from regular exercise.

A problem affecting many TTM staging algorithms is that the time periods are arbitrary. For instance, action and maintenance are usually distinguished by whether or not the duration of behavior change exceeds six months. Changing the time periods would lead to different stage distributions. The use of arbitrary time periods casts doubt on the assumption that the stages are qualitatively distinct, in other words that they are true stages rather than "pseudostages" – arbitrarily created segments of an underlying continuous variable ( Bandura, 1997).

Comparison of the algorithms in Appendices B and C illustrates another problem: TTM staging algorithms for different health behaviors often use inconsistent stage definitions. For example, in the algorithm for adoption of mammography (Rakowski, Dube, Marcus, Prochaska, Velicer & Abrams, 1992), action and maintenance are defined partly in terms of intentions (planning to have a mammogram in the coming year). It is possible for a woman to move directly from contemplation to maintenance simply by forming an intention, without passing through the action stage and without changing her behavior.

The Precaution Adoption Process Model

Unlike the TTM, in which a variety of measurement methods have arisen, only staging algorithms have been used in research on the PAPM to date. Appendix D gives a stage classification algorithm that would be suitable for any behavior for which a maintenance stage is not applicable (Weinstein & Sandman, 2002). These include behaviors that, if they are performed at all, are usually performed only once, e.g., having a predictive genetic test for inherited breast/ovarian cancer. Of course, virtually any behavior can be repeated: persons may test their home for radon and then retest it two years later. If a significant proportion of people in the sample have adopted the precaution before, then it may be necessary to take past behavior into account in the analysis and to reword the staging algorithm. Consider, for example, applying the model to participation in mammography screening. If the investigator is interested in first-time attendance for screening, he or she could either select a sample of women who have recently reached the lower age limit for screening and use the algorithm in Appendix D to stage them or select a sample of women who have never been screened and follow them over time until some of them have their first screen, using the algorithm to stage the sample on a number of occasions. Women who have had one mammogram could be allocated to stages with respect to having another mammogram. This would require modifications to the algorithm. Stage 1 would not be applicable for these women. And the statement used to classify women in Stage 2 could be reworded to something like "I haven’t thought about whether to have another mammogram". (An alternative approach would be to classify women who have had repeated mammograms in accordance with the recommended schedule as being in the maintenance stage. However, it would be difficult to know how to classify women who have had more than one mammogram but whose pattern of attendance does not conform to the recommended schedule.)

For ongoing behaviors such as the frequency of exercising or the amount of salt consumed per day, it is necessary to define a criterion level of behavior, for example doing at least 30 minutes of moderate physical activity every day. In this case, it would be appropriate to specify a maintenance stage, possibly defined in terms of duration as in the TTM, for example having maintained at least 30 minutes of moderate physical activity a day for at least six months. However, as noted above, such time periods are arbitrary and do not have face validity as marking a transition between discrete stages.

Reliability and Validity of Staging Algorithms and Self-categorizations

Although a staging algorithm uses several questionnaire items, it yields a single measure of stage. Similarly, self-categorizations are single-item measures. Indices of internal consistency such as Cronbach’s alpha are therefore not applicable. However, test-retest reliability can be assessed, by measuring stage on two occasions in the same sample. A short time interval (e.g., one week) should be used to reduce the likelihood that true change occurs. Several studies have assessed test-retest reliability of TTM staging algorithms over short time periods (e.g., Aveyard, Lancashire, Almond & Cheng, 2002; Carey, Purnine, Maisto & Carey, 2002; Donovan, Jones, Holman & Corti, 1998; Etter & Sutton, 2002), yielding kappas that ranged from 0.46 to 0.83 "moderate" to "very good" agreement according to Altman, 1991). The test-retest reliability of the component questions can also be assessed (e.g., Aveyard et al., 2002). The reliability of the PAPM algorithm has not been assessed to date.

There are several ways of assessing the validity of a staging algorithm or self-categorization. First, these measures incorporate a measure of the behavior (whether the individual is in a pre-action or a post-action stage). Sometimes it may be possible to assess the validity of this aspect of the stage measure against another measure of the behavior that can be assumed to be more valid. In the case of smoking, for example, a biochemical measure such as saliva cotinine concentration could be used to verify self-reported smoking status (i.e., pre-action versus post-action). Use of this method requires that the algorithm incorporates a clear and precise definition of the behavior. Furthermore, this method provides no evidence bearing on the validity or otherwise of the distinction between the pre-action stages or of the distinction between the post-action stages.

A second way to estimate the validity of the component items of a staging algorithm depends on the availability of suitable criteria. For example, intention to obtain a mammogram in the next 6 months can be validated against screening office records of attendance. Finally, the construct validity of a staging algorithm or self-categorization can be assessed by examining whether the stage measure is related to measures of the other constructs in the theory in ways that are predicted by the theory. Thus, a test of the construct validity of a stage measure is inherent in testing predictions from a stage theory. Because there is no ideal exemplar study in the literature, a hypothetical example is given: If a stage theory makes a particular prediction involving the stages, for example, that variables a, b and c influence the transition from stage I to stage II and variables c and d influence the transition from stage II to stage III, support for this prediction in a longitudinal study of stage transitions also provides support for the validity of the stage measure (and for the other measures). However, if the prediction is not supported, the problem could lie with the theory (it is misspecified in this respect) or with the validity of the stage measure or the measures of the other constructs.

Staging algorithms and self-categorizations differ in the degree to which they are transparent to the respondent. With a self-categorization measure, the respondent can see the full set of stages before selecting the one that best applies to them. By contrast, with a staging algorithm, the rules governing allocation to stages are usually not available to the respondent. This difference may have implications for reliability and validity. For example, compared with staging algorithms, self-categorizations may be easier to understand but more susceptible to social desirability bias.

<< Previous

Search | Help | Contact Us | Accessibility | Privacy Policy

DCCPSNational Cancer Institute Department of Health and Human Services National Institutes of Health

Health Behavior Constructs: Theory, Measurement, & Research