SECTION

6 Selecting Measures

This section challenges readers to think specifically about key features of a research study design and understand how those will influence the choice of a measure. The variety of physical activity measures available can be overwhelming and, depending on the features of the project, researchers and professionals must carefully consider the advantages and disadvantages of each measure. To help select a measure, investigators should consider the features of their project. This Guide emphasizes four features: (1) the population being assessed (i.e., children, adults, older adults), (2) the activity outcome (e.g., leisure-time physical activity or school-based physical activity; physical activity volume or physical activity frequency), (3) the research type (e.g., intervention study vs. surveillance), and (4) the resources available (e.g., existing personnel in the project, timeline for data collection, expertise required to handle the data, immediate feedback required).^p These four key considerations (Population, Activity outcome, Research Type, and Resources) are abbreviated with the acronym PARR, and each is described below. The description of the PARR uses the specific example of report- vs. monitor-based measures because these two measures are the most commonly used in studies of physical activity.

Population

The nature of the population is perhaps the most important consideration in selecting an instrument and that is why it is included as a key filter in the NCCOR Measures Registry. The selection of a specific age group (preschool, Elementary, Adolescent) will immediately facilitate the filtering choices available for review. Tools are often used across all ages but need to be adjusted to meet the needs of the population of interest. As described in Section 3, children and youth have unique physical activity patterns, cognitive skills, and metabolic responses to physical activity that have to be considered when deciding on what measures to use.

The impact of age on measure selection is well illustrated when choosing between report- vs. monitor-based measures. For example, report-based measures are typically ineffective for preschool children because they require cognitive skills that are still immature at this stage of life. Proxy measures that rely on parents to either assist in the completion of the surveys or report their perceptions about their child’s physical activity levels can provide an alternative report-based approach for this population. Although this is a viable method, parental perception about children’s physical activity may be biased, which can add substantial error to these estimates. The utility of report-based measures increases with age due to more advanced cognitive abilities to recall and report past events. Alternatively, monitor-based measures of physical activity avoid issues with subjectivity but researchers and practitioners need to carefully consider other limitations of these approaches for children. Pedometers, for example, are easy to use but children are more likely (than adolescents) to accidentally reset the device, which deletes the data. A more technical limitation relates to the inability of pedometers to account for differences in leg length, which can complicate age-related comparisons when interpreting overall amounts of physical activity. Activity monitors are easy to use but may offer practical limitations and may not provide the same type of feedback (thereby influencing compliance). An important technical limitation includes the challenges in handling the large amounts of data and the implications of different decisions involved in data collection. There is a consensus, for example, that the use of a 60-second epoch will lead to substantial underestimation of activity levels in preschoolers and young children due to the sporadic nature of their physical activity patterns.^q,25 Readers are referred to Section 10, which provides supplemental information on youth.

Activity Outcomes

As described in Section 3, physical activity data must be operationalized so it can be scored and interpreted. Within the broad scope of the definition of physical activity introduced in this section, researchers and practitioners should identify what activity outcome is of most interest by selecting both the domain (e.g., school-based physical activity) and dimension (e.g., intensity) of physical activity being studied. The choice of domain will require that researchers and practitioners select a measure that can capture the appropriate physical activity contexts (e.g., during commuting, recess, physical education, or after-school depending on whether it is leisure, school-related, or other). Most tools are flexible enough to capture this information depending on the data collection protocol used.^r Therefore, this section focuses on issues related to the dimension of physical activity while providing examples of the implications of using either report- or monitor-based measures of physical activity.

In studies related to obesity, the total volume of physical activity (or overall energy expenditure) is usually of interest because it allows the researcher or practitioner to examine energy balance. This indicator results from the combination of frequency and intensity and is highly comparable across different studies or projects. If the volume of physical activity needs to be determined, then the measure selected needs to capture both frequency and intensity of the activity performed over the time period defined by the researcher or practitioners. For example, report-based measures could be a good choice if items ask about the activities performed, duration, and frequency of participation. From the type of activities performed, it is possible to infer the intensity of the activity, and convert to an associated energy expenditure equivalent (i.e., METs). The duration and frequency can then be multiplied to obtain total energy expenditure or physical activity reported by the child (e.g., MET-minutes per week , Kcal/day). The challenge with this approach is that it often relies on absolute estimates of energy expenditure, which have some important implications (these have been described in Section 3). Alternatively, monitor-based measures, such as activity monitors, can store recorded movement over several days and weeks and can capture both intensity and frequency of events. Validated calibration equations can be used to convert recorded movement into estimates of energy expenditure.⁷²

Research Type

The research type inherently dictates the relative needs with regard to feasibility versus validity. The feasibility portion of the continuum described in Section 5 is influenced by the funding available for the study or project (i.e., money available to buy either expensive or affordable tools), the necessary sample size (e.g., number of measures needed), and the level of burden placed on the child as a result of the assessment (i.e., time and effort required to comply with the assessment). These factors need to be weighed in relation to the accuracy of the measure. The general considerations with each type of behavioral epidemiology research (see Figure 2) are summarized below. Although the behavioral epidemiology framework includes five categories, they are summarized in three major divisions here.

Basic Research and Health Outcomes Research

Measures of physical activity typically serve as independent variables in basic research and health outcomes research. To elucidate mechanisms and understand health impacts, it is common to compare changes before or after training adaptations take place or between active or inactive groups. Measures used in these types of research often favor precision or accuracy over feasibility. Many designs are possible and the need for precision varies based on the application and outcomes of interest. Some lab-based studies may rely on criterion measures of indirect calorimetry but field-based studies or longitudinal studies (with larger samples) may necessitate simpler and more feasible assessment options. The choice between report- and monitor-based measures can be difficult, as both sets of tools have unique advantages for this type of research. Report-based measures can facilitate the assessment of large samples, which would improve the representativeness of the population being studied (thereby improving external validity) while monitor-based measures have higher accuracy (thereby improving internal validity). The decision may depend on the sensitivity needed to capture associations with the specific outcome of interest.

For example, the majority of scientific evidence on the health benefits of physical activity has been accumulated predominantly with report-based measures. The increased availability of monitor-based measures has facilitated the inclusion of these tools in this type of research. However, report-based measures are still predominantly used. This is because in many types of epidemiology studies, the main need is to simply classify individuals into general levels of physical activity participation. Report-based measures such as self-reports have proven to be useful to rank individuals according to their activity level and therefore provide sufficient accuracy to categorize individuals based on their level of physical activity (e.g., quintiles).

This type of stratification is sufficient for some applications, but research in this area also is aimed at determining the dose-response between physical activity and outcomes of interest, and this necessitates more precision (particularly if the goal is to establish clinically meaningful thresholds). More precise estimates of physical activity may be needed if researchers or practitioners are attempting to determine the dose of physical activity necessary to achieve health benefits. The use of activity monitors would be strongly recommended in these situations even though some procedures would allow the improvement of the estimates obtained from self-reports.⁷³ Obtaining precise estimates of physical activity at the individual level is still challenging with activity monitors, but the error is substantially lower when compared to self-reports.^s The need for precision is also greater in situations where the association between physical activity and the health outcome is subtler or harder to detect. In these situations, the more precise the measure, the greater the likelihood that it will be able to capture the associations and possible effect of physical activity on the outcome of interest. The diverse range of applications and designs make it difficult to generalize about the most appropriate measures.

Surveillance Research

The goal in many surveillance applications is to evaluate levels of physical activity in the population, so physical activity measures most typically serve as dependent variables. Measures used in studies or projects of this nature tend to emphasize feasibility over validity due to the greater emphasis on sampling and external validity. Report-based measures have historically been more common in these types of studies but monitor-based measures are now widely used in large-scale surveillance applications. The key need is to capture population-level estimates so emphasis is on ensuring that the measures have adequate group-level measurement properties (see Section 4). Unique challenges related to this design involve selecting a tool that is feasible for large samples; that has measurement properties not affected by the population being assessed (e.g., equally valid for youth ages 8 to 18 years, or equally valid for youth of different countries); that is sensitive enough to capture sex, seasonal, or age-related group differences in physical activity; or that can capture either changes over time or differences between subgroups (e.g., boys vs. girls, children vs. adolescents).

Report-based measures (primarily self-reports) are the most common tool for surveillance studies, but it can be challenging to find a survey that can fulfill critical measurement requirements. For example, well-designed self-report tools can provide reasonable estimates for some groups of children, but the instrument may have differential properties (e.g., reliability and validity) for various ages or in youth from different backgrounds and may lack the ability to capture physical activity changes over time. For example, the development of the International Physical Activity Questionnaire was specifically aimed at standardizing a self-report tool so that it could be used in different countries to provide a common metric. Although report-based measures are easier to use and less expensive, monitor-based measures offer advantages for standardization because they capture only the movement performed. The use of monitor-based measures is now common in both large population surveys and smaller studies. in which the focus is on comparing levels of physical activity in different groups or segments of the population.

Theory and Correlate Research/Intervention Research

Theory and correlate research often focuses on identifying factors (i.e., correlates) that may explain differences in physical activity levels in the population or in testing theories that may explain physical activity behavior. Intervention research then seeks to use insights to plan and evaluate strategies designed to promote physical activity in the population. Designs in these realms of behavioral research vary widely and physical activity can serve as either an independent variable or a dependent variable. Studies, for example, may compare a battery of psychosocial predictors for active or inactive individuals (physical activity as an independent variable) or use a battery of correlates to explain physical activity behavior (physical activity as a dependent variable). Studies can be set up to compare health outcomes in groups with different levels of physical activity (physical activity as an independent variable) or to quantify actual differences in physical activity outcomes (physical activity as a dependent variable). Measures used in theory and correlate research tend to be at the middle to high left end of the feasibility vs. validity continuum, while measures for intervention designs are often at the middle and lower right end of the continuum because precise estimates are often needed to detect any small differences between intervention groups. The distinctions in the design can have important implications for the need for precision and the type of measure that would work best.

Overall, report-based measures (self-report measures in particular) are still very popular in these study designs, because they can not only provide reasonable estimates of group-level physical activity but also add contextual information that is usually of interest depending on the intervention being conducted. However, one key need in interventions is the ability to detect changes in physical activity as a result of the intervention or to capture differences in physical activity among people with different health conditions. These two require a level of precision that is often not characteristic of self-report tools. This level of precision requires measures that can provide more accurate estimates at the individual level, and this need can vary depending on the expected impact of a physical activity intervention on activity levels or association between physical activity and a health outcome. For example, in situations where the intervention is expected to increase physical activity by a small but still meaningful amount (e.g., increase physical activity during recess), researchers or practitioners might need a precise measure that is sensitive to these changes in physical activity. Assuming recess is likely to last for 15 to 20 minutes, the measure must be able to capture changes of 10 to 15 minutes of activity or less during the recess period. It would be unrealistic to expect that a self-report can capture such effects. Monitor-based measures would be better suited for this purpose. However, depending on the monitor-based measure, important drawbacks may need to be considered. This is particularly true for pedometers, which are known to also serve as a motivational tool because they can provide immediate and interpretable feedback. Their use is strongly recommended in studies examining motivation. However, their use in interventions that are aimed at manipulating other factors (e.g., inducing changes in the environment) can add bias when determining the treatment (e.g., physical activity program) effect or changes in physical activity levels.

Resources

The selection of a measure also will need to take into consideration the timeline for the project or study, available or planned human resources allocated for the project or study, and the need for immediate feedback. The timeline can often dictate the schedule and timing of data collection and this, in turn, can dictate the most practical assessment strategy. The ratio between the sample size and the timeline can give a good indication of what measure property needs to be prioritized. The human resources relate to the availability of human capital to collect data or the expertise required to handle the data processing tasks, while immediate feedback involves having estimates of physical activity available to participants as they participate in the measurement protocol.

If the sample size is high or the timeline is short, a simpler and more practical assessment, such as a self-report tool, may be warranted. However, if the sample size is small or the timeline long it becomes more feasible to use a monitor-based approach. This factor plays an important role in the measurement protocol and is heavily influenced by the availability of staff to collect and process the data. Having more staff will allow for more intense data collection protocols and can increase the likelihood that more people be assessed per unit of time. The availability of staff is critical for both data collection (e.g., setting up devices or assisting with completion of surveys) and the data processing steps. In general, report-based measures are better suited for short timelines and large sample sizes, while monitor-based measures might work better for a larger window of time for data collection and small to medium size samples. Again, this relation is further compounded by the availability of research staff. For example, if the size of the research team is limited, it becomes more challenging to initialize and distribute a large number of activity monitors or to download and process data from multiple monitors. In this case, it may be necessary to reduce the sample size or to collect data over a longer span of time to ensure that the data on sufficient number of participants can be collected. With monitor-based methods, the availability of monitors also can become a rate-limiting factor and may dictate the rate at which data can be collected. Therefore, it is important to carefully evaluate both the timeline and the human resources available to determine how many assessments can be conducted per day or week. This will directly determine the feasibility of a given assessment approach.

Two other important factors that are often overlooked are the expertise required to handle the data collected and the need for immediate feedback. Overall, data processing protocols for report-based measures often require less technical work, while monitor-based measures are highly susceptible to processing decisions and require a greater level of expertise. For example, the staff working with report-based tools may only need to calculate energy expenditure or physical activity variables while staff collecting monitor-based tools may need to be familiar with software and data processing methods. With regard to feedback, most report- and monitor-based measures require some level of processing before data can be interpreted and feedback provided, but there are some exceptions. For example, web-based questionnaires can automate processing and offer immediate feedback. Similarly, pedometers and some consumer-based devices can provide immediate feedback to participants (i.e., number of accumulated steps). This type of feedback can be an advantage for intervention studies focused on changing behavior but can be problematic for studies attempting to capture “typical” behavior. There is considerable interest in new lines of consumer activity monitors but more work is needed to understand the measurement properties of the various models available (see Section 9). Thus, the need for immediate feedback can narrow the list of measures suitable for a particular project, but some of the challenges can be overcome depending on the human resources allocated for the project. A larger research team will permit staff members to be allocated to data processing once data are collected and can therefore provide feedback once the protocol is completed. This approach does not replace the need for immediate feedback but instead allows for feedback at the end of participation in the project.

Decision Confirmation

A final step, once a set of possible measures is selected, involves filtering among the available report-based or monitor-based measures, models, or versions to determine which is most appropriate considering their measurement properties when applied to the population of interest. The NCCOR Measures Registry helps to summarize the documented evidence regarding the different tools to facilitate this review. For example, if the assessments are to be conducted in adolescents, then researchers or professionals can filter the existing measures for adolescents ages 12 to 18 years, and then select the type of measure preferred. Once the measure is selected, it is possible to access a variety of studies that summarize the properties or measurement characteristics of the instrument. This information can help determine whether the measure is indeed appropriate for the design of the research study or project. It is important to note that the Measures Registry only summarizes the information. It is up to the researcher to carefully review the findings and to determine whether the results generalize or apply to their situation.

A final step, once a set of possible measures is selected, involves filtering among the available report-based or monitor-based measures, models, or versions to determine which is most apropriate considering their measurement properties when applied to the population of interest. The NCCOR Measures Registry helps to summarize the documented evidence regarding the different tools to facilitate this review. For example, if the assessments are to be conducted in adolescents, then researchers or professionals can filter the existing measures for adolescents ages 12 to 18 years, and then select the type of measure preferred. Once the measure is selected, it is possible to access a variety of studies that summarize the properties or measurement characteristics of the instrument. This information can help determine whether the measure is indeed appropriate for the design of the research study or project. It is important to note that the Measures Registry only summarizes the information. It is up to the researcher to carefully review the findings and to determine whether the results generalize or apply to their situation.

Before formal adoption and use, it is essential to test the selected measure under real-world conditions through pilot testing or a formalized feasibility study. Pilot testing involves replicating the design of the project or study but in a small fraction of the population of interest. Pilot testing can help determine whether the the physical activity assessment protocol is appropriate. More importantly, such a step allows the researcher to test the different steps associated with the measurement protocol: preparing for data collection, collecting data, and processing and handling the data generated from the collection phase. This is particularly important if the user is not familiar with the tool selected. Some considerations for data processing and data management are summarized in Section 10. However, it is first important to provide practical applications of how to select a measure.

^p This list of considerations is aligned with the steps proposed by Strath et al. (see reference 3) and colleagues, but differs in some respects. An additional consideration of “Population” was added because issues with assessments vary greatly by age and other demographic factors. Several of the steps proposed by Strath and colleagues also were combined to facilitate interpretation.

^q Assume that a certain activity monitor uses a cutpoint for MVPA of 2000 counts per minute. A child who is playing a tag game can accumulate 1200 counts in 30 seconds and then remain sitting or in a standing position for the other 30 seconds and therefore accumulate 0 counts during the remaining fraction of the minute. The aggregated counts for this entire minute (1200) would be less than the threshold (e.g. < 2000) and indicate that the child was not active during that minute even though half of the time was spent running. If the epoch was 30 seconds, the counts would exceed an adjusted threshold (e.g., 1000) and the same period would be categorized as active. The use of 1-minute epochs essentially “ignores” these shorter bouts of activity, resulting in underestimations of activity levels in children.

^r Existing self-report measures tend to overlook important physical activity domains, such as activity associated with transportation, while monitor-based measures cannot provide direct information about context. In both cases, the measurement protocol
can be adapted to capture this information by either including additional items in a self-report (report-based measures) or by obtaining detailed schedule information (e.g., school time) and extract raw data during the period of interest.

^s The use of measurement error models has helped to refine the precision of some self-report measures. This has been shown to strengthen the associations between physical activity and health outcomes such as obesity or diabetes by 30% to 50% (see Reference 88).

Measures Registry USER GUIDES