SECTION

5 Overview of Physical Activity Assessment Tools

A number of techniques and tools are available to assess physical activity in children and youth, each with advantages and disadvantages. This section provides a brief description of physical activity measures while establishing an important relation between feasibility and validity. Many excellent review papers have already summarized the utility of various assessment tools^1,4,8,40,41 so this section emphasizes the considerations that are important when comparing tools described in the NCCOR Measures Registry.

The Feasibility/Validity Continuum

A common method for comparing measures is to determine their validity in relation to their feasibility. This relationship is illustrated in Figure 5A.^f Measures that are highly valid are often too expensive or cumbersome to use in large-scale research applications. Measures that are more feasible to use may have lower validity. For simplicity, the measures are aggregated into three main categories (report-based measures, monitor-based measures, and criterion measures). The feasibility/validity continuum has been developed and discussed for measures of energy expenditure. However, there is a growing realization that energy expenditure alone does not fully capture other important aspect of physical activity, such as strength, flexibility, or other behaviors.⁴² Bowles et al. have discussed this issue in the context of measurement challenges related to distinguishing between behavior, activity, and motion. They describe “motion,” which can be captured by devices such as accelerometers; “activity,” which represents a class of actions such as cleaning house or playing soccer; and “behavior,” which is a specific action embedded in the activity, such as standing in the goalie box or wiping the windows with a paper towel.⁶ More work is needed to explore the feasibility/validity continuum for measurements of behavior and activity, but clearly it will differ from the continuum for estimates of energy expenditure. For example, DLW has essentially zero validity for measuring specific activities, as it measures aggregate energy expenditure over one or more days. Figure 5B^g illustrates the feasibility/validity continuum for measures of behavior. Note the distinct differences in the criterion measures and the importance of physical activity recall and time use methods.

Figure 5a: Physical Activity Assessment Tools and Their Relative Positions on the Feasibility/Validity Continuum

Report-based measures include self-report tools and diaries that are designed to provide subjective information about physical activity levels and the context of physical activity behaviors. These measures tend to be the most feasible to use (due to both lower administrative and processing costs) but also tend to have lower validity. The choice of measurement approach for a given application depends to a great extent on the relative importance of feasibility and validity, but a number of other factors also come into play when selecting a measure.

Monitor-based measures include various devices designed to objectively quantify movement, such as accelerometers, GPS units, and pedometers or devices to measure the intensity and duration of physical activity such as heart rate monitors. These measures have a good balance between feasibility and validity, making them attractive for a number of research and evaluation applications.

Criterion measures include the DLW method, indirect calorimetry, and direct observation. These measures provide criterion estimates of energy expenditure and movement and are typically used for validation studies, smaller applications, or in lab-based study designs where precise indicators are needed.

Figure 5b: Feasibility/Validity Continuum for Physical Activity Behavior

The above assessments apply to efforts aimed at measuring energy expenditure from physical activity. This is likely to be important for many health studies because of the links between level of energy expenditure and diverse health and energy balance-related benefits of physical activity. However, studies focused on the social, strength, or flexibility benefits of physical activity may require measures selection from a distinct feasibility/validity continuum. More work is needed to better categorize measures as focusing on one or more of these three aspects of physical activity: behavior, activity, and motion. In any case, a critical step in measures selection is careful thinking about which aspect of physical activity needs to be measured.

As previously described in Section 3 and Section 4, measures obtained from participants are often raw movement or raw reports of behaviors performed. To enable the raw measures to be of value for physical activity research, they are typically calibrated against other more valid “criterion” measures. Thus, a key goal of calibration is to minimize error in the estimates of more feasible measures and to make physical activity values more interpretable. The most precise estimate of energy expenditure is known to be DLW. However, this is a very expensive method and not practical for calibration applications. Therefore, researchers have more commonly used other measures of energy expenditure such as indirect calorimetry. Monitor-based measures, for example, are commonly calibrated against indirect calorimetry systems to establish the relationship between movement and energy expenditure or exercise intensity. This process typically involves having individuals perform a series of different activities while being simultaneously assessed with both a monitor-based measure and indirect calorimetry (criterion measure). Direct observation is not as commonly used for calibration, but it is often used for validation purposes to test the classification of observed behaviors.

Although most work has focused on calibration of activity monitors, estimates obtained from self-report measures also need to be calibrated. Most, if not all, report-based measures are designed to capture free-living activity. Therefore, calibration of these measures requires a criterion measure that is also designed to capture activity in a variety of contexts. Indirect calorimetry is not well-suited to calibrate these measures, so monitor-based measures are typically the best option for calibrating report-based measures. The criterion measure in this case still has considerable error, so calibration of report-based measures will only be as accurate as the underlying criterion measure to which it was related. New methods and approaches offer promise for improving calibration methods and these are briefly introduced in Section 8.

A detailed explanation of calibration research is beyond the scope of this Guide, but it is important for researchers to have a conceptual understanding of the calibration process because it provides the foundation for how measures are used to “assess” physical activity behavior. The description of each tool will be presented in relation to its role in calibration and with reference to the inherent balance between feasibility and validity. The criterion measures will be introduced first, followed by monitor-based measures and report-based measures.

Summary of Assessment Tools

The following sections describe the major types of assessment tools used to measure individual physical activity.

Report-based Measures

Report-based measures described in this Guide include various self-report surveys, such as physical activity questionnaires and diaries, that capture a participant’s perception and interpretation of physical activity behavior. These tools also can be defined as subjective measures because they rely on the person's ability to interpret and recall physical activity and are generally categorized by mode of administration: self-administered or interview (most are self-administered). Recall-based self-reports may prompt users to recall time periods ranging from 1 day to 3 months. The time needed to complete the questionnaires may range from 1 to 20 minutes, with most of the self-report measures requiring less than 10 minutes to be completed. Diaries have very different properties and characteristics because they generally require that a person records the activity performed throughout the day or right after it occurs. The level of detail varies with the instrument and a number of logging tools are now available for real time tracking with cell phone applications (see Section 8). Regardless of form, the information collected from self-reports and diaries is often used to convert to measures of energy expenditure (e.g., kcals) and both can provide information on frequency, intensity, duration, and type of physical activity and also include context of physical activity (e.g., inside vs. outside). The frequency and duration of the activities reported can be useful in determining compliance with physical activity guidelines and in computing volume of physical activity performed during a pre-defined time window.

Four major categories of self-report assessments are records or logs,^h recall questionnaire, quantitative history,ⁱ and global self-report.^j Recall questionnaires tend to be used more often than other types of self-report assessment, and examples for youth include the Previous Day Physical Activity Recall (1d-PAR), the 3-day Physical Activity Recall (3d-PAR), and the Youth Activity Profile (YAP). A major limitation of recall questionnaires is the level of subjectivity involved in the estimates provided and challenges imposed on participants as a result of the recall process (this is particularly true in youth, as described in Section 3). Another challenge is the limited utility of questionnaires for directly estimating time in physical activity or in computing total daily energy expenditure values (other than referring to standardized estimates of activities energy cost). Therefore, individual error tends to be very high. Individual error also often compromises the ability of a self-report to capture changes in physical activity that might occur over time or as a result of treatment effect. However, with calibration, it is possible to model the error from these tools and produce group estimates of physical activity that mirror estimates obtained from more accurate measures, such as activity monitors. Recall questionnaires are easy to administer and inexpensive and are the only field measure that can capture both the type and the context of physical activity. These particular attributes can offset the limitations described above. More detail on self-reports is available elsewhere.^{10,30,33,43-48}

Monitor-based Measures

Activity monitors: Activity monitors provide a good balance between accuracy and feasibility and therefore are widely used in contemporary physical activity research applications. They have historically been worn on the waist but recent trends have shifted to wrist-worn monitors.^k Most devices use internal accelerometers to obtain an objective indicator of the amount of movement being performed. The resulting value has typically been called an “activity count,” which is a dimensionless value that is difficult to interpret because it has no real physical or physiological meaning. Activity counts have been calibrated to output meaningful outcomes such as energy expenditure and METs. These calibration equations (in addition to activity counts) typically use information, such as body weight, age, and sex, to predict energy expenditure and these estimates are often categorized into sedentary, light-, or moderate- and vigorous-intensity physical activity to determine the time spent in the different activities (e.g., percent time in moderate- and vigorous-intensity physical activity). The calibration process enables the monitors to evaluate compliance with physical activity guidelines, and overall, these tools can capture all domains of physical activity except for the “type” of activity being performed.^l

Popular activity monitors include the Actigraph and the GENEactiv, but a number of devices are available and each has different features and capabilities. The price of monitors can range from $200 to $400 each and it may be necessary to use customized software from the manufacturer to process the data (at an additional cost). A few limitations to consider (other than the high cost) include the burden placed on participants, who often have to use the monitor for long periods of time (e.g., 7 days or more). Additionally, monitors can place some burden on researchers and practitioners when extracting or processing the data. Another key limitation is that waist-worn monitors are not well-suited for capturing cycling, non-ambulatory movements, weight-bearing activities, or upper body activities (wrist-worn monitors can overcome some of these limitations). Despite these limitations, the objective nature of monitoring devices offers many advantages for field-based research, including the ability to detect the magnitude and temporal characteristics of the movement. Monitor-based tools continue to evolve along with methods for processing and calibrating the data from these devices. However, it is important to note that advances in technology and methods often come at the expense of feasibility. Recent trends in monitor technology are briefly described in Section 8 but readers are referred to other scientific papers for more detailed reviews.^42,49-53

Heart rate monitors: Heart rate monitors are no longer as popular as they were in the early studies of physical activity behavior, but they are common in exercise training applications and for absolute determinations of exercise intensity. These measures capture the physiological response to movement in terms of heart rate (usually expressed in beats per minute) and reflect the level of stress imposed on the cardiorespiratory system. Early heart rate monitors relied on chest straps but contemporary monitors can measure heart rate through an optical sensor built into a wristwatch (i.e., wrist conductivity). These measures track the number of beats per minute (bpm), which is (assumed to be) linearly related to oxygen consumption. This relation serves as the foundation for existent calibrated heart rate measures developed to provide estimates of energy expenditure in kcal/day or kJ and that can be used to discriminate between different activity intensities.^m Heart rate thresholds also have been used to determine the time or percent of time that individuals spend above pre-defined intensity levels (e.g., 140 bpm indicating moderate intensity). Heart rate monitors can assess frequency, intensity, and duration of physical activity, but like activity monitors, they provide no information about type or context of physical activity.

A variety of heart rate monitors are available on the market and the cost is typically $100 or higher depending on additional features and design. The key limitations of early heart rate monitors included the lack of accuracy to discriminate intensity at the lower spectrum of activities and the susceptibility of heart rate to factors other than movement (e.g., ambient temperature, level of hydration, anxiety). Newer heart rate monitors (placed on the wrist) rely on optical sensors to measure physical activity, but these alone have been deemed inaccurate during more vigorous exercise (e.g., running at a speed >6.0 mph). For these reasons, heart rate monitors are more commonly used in combination with other measures (e.g., activity monitors) or for controlled laboratory-based studies. Heart rate monitors are particularly useful for monitoring activity associated with non-ambulatory activities, such as cycling or swimming, and for evaluating individual responses to physical activity. A unique advantage is that heart rate monitors objectively capture the relative level of stress. The thresholds used to determine relative intensity are usually determined based on individual calibrated heart rate values that take into account resting or maximal heart rate (i.e., expressed as percent of maximal heart rate or percent of heart rate reserve).^54-58

Pedometers: Pedometers are objective monitoring devices designed solely to quantify the number of steps performed as an indicator of movement. Pedometers have evolved substantially, but early models of pedometers used a horizontal lever-arm or piezo-electric mechanism that captured vertical accelerations of the hip. Many of the most recent models now use accelerometers to detect the number of steps. Pedometers track vertical hip movements that are recorded as steps and the data usually can be stored for 24 hours or several days depending on the pedometer. Pedometers can capture the frequency of movement (i.e., number of steps) but also are able to produce estimates of the distance covered (i.e., number of steps X individual stride length). The ability to predict energy expenditure is limited but some device-specific algorithms have been developed for this purpose. The number of steps accumulated is usually expressed per day, and recommended values for youth can range between 10,000 to 15,000 steps/day depending on age and sex. These recommendations often lack a criterion outcome to define sufficient steps for health. However, some of the guidelines have been validated to differentiate between youth of different body weight status while others have been generated to reflect an equivalent of 60 minutes per day of MVPA.

Many brands of pedometers are available ranging in price from $10 to $200 each depending on the features and memory capability. The key limitations of pedometers include the inability to capture non-ambulatory activities (e.g., cycling), and the level of inaccuracy when predicting energy expenditure. However, the key advantage is the ease of use and the reliability and validity for estimating steps accumulated (at walking speeds) during the day. Steps can provide a good indicator of overall physical activity patterns among youth because a substantial portion of their physical activity derives from lower-body, locomotor movement. Pedometers also are particularly useful as a motivational tool and therefore are widely used in physical activity promotion studies. More detail is available elsewhere.^50,59-64

Criterion Measures

Doubly-labeled water (DLW): Doubly-labeled water is the most accurate measure of total energy expenditure and allows an activity’s energy expenditure to be determined if estimates for the thermic effect of food and resting energy expenditure are available. Total energy expenditure as measured by DLW is determined by evaluating the metabolic breakdown of two stable isotopes (deuterium [²H] and oxygen-18 [¹⁸O]) over time. The isotope-labeled water is administered orally using standardized doses depending on the individual’s total body water. The evaluation requires between 7 and 21 days with the traces of the isotopes obtained through sequential urine samples. The rate of depletion of the isotopes is used to estimate total carbon dioxide produced over time and ultimately a calculation of total energy expenditure. This technique is extremely expensive, requires advanced expertise to handle both the measurement protocol and data processing, and estimates are limited to total energy expenditure. Thus, this tool is not able to assess other dimensions of physical activity, such as intensity, duration, frequency, and type. Despite these limitations, DLW provides the most accurate measure of total energy expenditure and is particularly useful for measurement protocols aimed at providing a summary measure of overall free-living energy expenditure.

Calorimetry: Calorimetry is a method based on the measurement of heat released due to the chemical processes occurring when metabolizing different body substrates (e.g., carbohydrates, fat, or protein). The resulting breakdown of energy associated with these chemical processes can be inferred by determining the amount of heat released from the body, and using either direct or indirect calorimetry. Direct calorimetry involves the direct measurement of body heat released to the air and requires the use of a room calorimeter (also known as heat chamber) so it is not commonly used. Indirect calorimetry is a widely accepted and more practical alternative tool for the measurement of energy expenditure. It provides an estimate of heat produced based on the relation between oxygen consumed and carbon dioxide produced, typically referred to as the respiratory exchange ratio. The method relies on the assumption that 1 liter of consumed oxygen is equivalent to known amounts of kcal depending on the substrate being metabolized. For simplicity, it is often assumed a respiratory exchange ratio of 1.0 and a caloric equivalent of 5.0 kcal per liter of oxygen consumed. Measuring energy expenditure using indirect calorimetry is commonly performed using laboratory oxygen and carbon dioxide gas analyzers, or a portable gas exchange/analysis system.ⁿ This method is commonly used as a criterion measure to establish relationships between movement and estimates of energy expenditure from monitor-based tools.^65-66

Direct observation: Direct observation is considered to be a gold standard method of physical activity assessment because behavior is directly observed. Observation typically involves the choice of a participant to observe (because it is not possible to observe all participants at the same time), when to watch (because it is not practical to try to observe continuously for extended periods), and how to record the behavior (record every single behavior once it occurs or record if the behavior lasts for a pre-defined amount of time). Technical considerations when using direct observation include: (1) the definition of physical activity behaviors to be recorded, and (2) the selection of the most appropriate behavior recording technique. Additional considerations include the selection of the observation pacing method and the choice of software to record and analyze the data. The behaviors of interest also should be carefully defined and organized into classes of mutually exclusive behaviors. Examples of behavior classes observed include various postures (i.e., lying down, sitting, standing, and walking) or activity intensities (i.e., sedentary, light, moderate, and vigorous). With observation, it is also possible to determine time spent in a specific posture and assign an intensity category to the posture being coded. This method can be of great value when understanding behavior because environmental factors (i.e., the context of the behavior or movement) also can be assessed. Depending on the observation method, it is possible to accurately classify the type, intensity, duration, frequency, and context of activities performed.

Some examples of direct observation instruments include the System for Observing Fitness Instruction (SOFIT), Behaviors of Eating and Activity for Children’s Health: Evaluation System (BEACHES), System for Observing Play and Leisure Activity in Youth (SOPLAY), and System for Observing Play and Active Recreation in Communities (SOPARC). The use of standardized procedures during observation provides good objectivity, but observation of behaviors always involves some degree of subjectivity and can impose a high experiment burden in terms of cost and time. Software is available to facilitate recordings and tracking of data but the method requires time, expertise, and practice. Overall, the direct observation method is considered to be an appropriate criterion measure of physical activity if conducted using standardized procedures and trained observers. Often, it is the only way to directly understand the context of behavior, but advances in video-based methods and ecological momentary analyses provide alternative views of behavior.^o Readers interested in direct observation are encouraged to refer to definitive technical guides for the methods and tools.^67-68

^f The original depiction of this relationship is in a book edited by Dr. Tom Rowland (Rowland TW. Aerobic fitness. In T. W. Rowland, ed. Developmental Exercise Physiology. 2nd edition. Champaign, IL: Human Kinetics, 2005. P. 89-108.). The concepts have been adapted and used in different ways in this Guide to characterize the relationships among the measures and the distinctions among the three main classes of assessments.

^g Figure 5B has been developed by David Berrigan and Richard Troiano for the purposes of this User Guide.

^h Records or logs of physical activity can sometimes include diaries depending on the definition but are often placed into a separate category of self-reports ( just as we did in this Guide). These include recording the frequency and/or duration of activities as they occur and providing comprehensive characterization of physical activity patterns. However, these are likely the least feasible method within self-reports as they place a great burden on individuals being assessed.

ⁱ Quantitative histories are typically long questionnaires (e.g., 50 items) that are designed to assess lifetime or long-term (e.g., over the previous year) physical activity patterns. These can provide a comprehensive characterization of physical activity and capture important dimensions such as duration and frequency. However, they are likely to have a considerable amount of error when compared to other categories of self- reports.

^j These are very brief questionnaires (typically composed by a single item) that are designed to assess general physical activity levels and are often used to determine whether individuals meet or not a specific physical activity threshold, such as recommended guidelines. These types of self-reports provide limited characterization of physical activity levels as they do not ask about type, context, or patterns of physical activity.

^k The majority of work on accelerometry-based monitors has been conducted using waist/hip worn devices. However, investigators have moved toward using wrist-worn monitors. This transition has been fueled by the progression in consumer-based monitors as well as by evidence that compliance is enhanced when participants are asked to wear monitors on the wrist (more like a watch).

^l New pattern recognition approaches have shown promise in detecting underlying movement patterns and classifying type of activities performed, but accurate detection of the diverse range of activities performed under free-living conditions remains elusive.

^m Calibration equations are generally based on the assumption that heart rate is linearly related to energy expenditure. This assumption is particularly true for moderate- to vigorous-intensity activities. However, the assumption might not hold across sedentary and light-intensity activities.

ⁿ The portable systems use the same principles as metabolic carts but require participants to wear a backpack-type harness that holds two light-weight sensors (O₂ and CO₂) and transmission modules secured to the body that enable estimates of oxygen consumption and carbon dioxide production to be sent to and displayed on a laptop computer.

^o Ecological momentary assessment techniques provide a way to capture context of behavior. Using text messages and smartphone prompts, it is possible to capture information about the type, intensity, purpose, or setting of activity. Ecological momentary assessment offers many advantages for physical activity research but it has a number of logistical and assessment challenges. (See references 69-71).

Measures Registry USER GUIDES