MEASURES REGISTRY USER GUIDE
Evaluating Individual Diet Measures
In examining the appropriateness of different measures for assessing dietary behavior, we are interested in their psychometric properties. These include validity and reliability, and the associated concept of measurement error.
Validity refers to the extent to which a measurement reflects true dietary behavior. There are different types of validity (Box 6) and they are tightly linked to one another.
Box 6. Types of Validity
- Criterion-related validity: refers to the extent to which a measure is predictive of an external criterion and is accurate. To assess criterion-related validity, the extent to which a measure agrees with another valid measure is assessed. An example within dietary behavior is examination of the criterion validity of intake data captured using a self-report measure compared to documented true intake ascertained using an objective measure, such as data from a recovery biomarker or observation.
- Construct validity: refers to the extent to which observed relationships between the measure (e.g., a scale) and other variables are as expected. For example, the relationship between measured eating attitudes and dietary intake might be assessed.
- Content validity: refers to whether items accurately represent the underlying construct (e.g., a particular dietary behavior) that is being measured. For example, depending on the definition employed, a measure of sugar-sweetened beverage consumption should include fruit juice and flavored milk, as well as soda.
- Face validity: refers to the extent to which a measure has conceptual validity. This is often assessed through a review by expert judges. Face validity might also pertain to a questionnaire about sugar-sweetened beverages, for which there is a lack of consensus on definitions.
Reliability refers to the consistency with which a behavior is measured (Box 7). Different types of reliability may be particularly salient depending on the research question and study design.
Box 7. Types of Reliability
- Inter-rater reliability: Agreement among raters. Within the context of dietary behavior, this might be relevant in a situation in which multiple trained observers document intake. Inter-rater reliability can be assessed using a correlation coefficient or Cohen’s kappa.
- Test-retest reliability: The correlation between two administrations of the measure to the same respondent, or repeatability. Test-retest reliability is of relevance in studies in which dietary behavior is measured at multiple time points, for example, to detect change before and after exposure to an intervention. Test-retest reliability can be assessed using a correlation coefficient.
- Internal consistency: The extent to which items within a measure (e.g., a scale) measure the same behavior. Internal consistency is relevant, for example, to a scale intended to measure various aspects of dietary patterns through different subscales (e.g., fruit/vegetables, dairy products, meats) and can be assessed using Cronbach’s alpha.
Within the field of dietary intake assessment, we often discuss the extent of measurement error in self-report data.102 Measurement error refers to the difference between the true value of a parameter, such as intake of a dietary component, and the value estimated using a measure, such as a 24-hour recall or food frequency questionnaire. There are two types of measurement error: random and systematic. With random error, the errors may be in the direction of under- or over-estimation. If a sufficient number of observations are available, the errors will average to zero. Thus, the measurements are not precise but they are not biased. For dietary intake data, day-to-day variation in what individuals eat and drink is the main source of random error, affecting primarily short-term instruments. In other words, intake captured on a given day is affected by excess variation due to differences in what individuals consume from day to day (known as day-to-day, or within-person, variation). Random error is related to the reliability of data collected using a measure.
With systematic error, measurements depart from truth in a consistent direction such that the data are biased (thus, systematic error is also known as bias) toward either under or over-reporting. Contributors to systematic error within dietary intake measures include recall biases, reactivity, social desirability biases,35 cognitive abilities such as limited attention span, food and body attitudes and perceptions, body weight status,38,103 and food habits and the complexity of diet.36 In capturing children’s dietary behaviors, whether the child or parent reports dietary behavior can also play a role,104 as can the recency of the reporting period95,105 and the use of portion size aids designed for adults.41 Systematic error is related to the validity of data collected using a measure.
In selecting measures to assess dietary intake or other diet-related behaviors, it is critical to consider psychometric properties, including whether evaluation of these properties has occurred within similar populations as the target study population and how the psychometric properties were examined (e.g., against what reference measures and using what statistics or other procedures). Different properties may be particularly salient to the study design. For example, test-retest reliability is an important component to consider in the selection of a measure used in cohort or intervention studies, in which multiple points of data collection are required. Further, measures of dietary behavior (aside from intake) should be assessed for construct validity because of the ambiguity underlying some concepts (e.g., food addiction, eating attitudes) that cannot be assessed through biological markers.106
Studies to assess the extent of error in measures are referred to as validation studies. In dietary intake assessment, validation studies include a reference measure that is a marker of true intake, such as a recovery biomarker. A number of biomarker-based validation studies have been conducted to assess error in self-report measures. Many of these have been completed among adults and tend to show that data collected using 24-hour recalls are affected by significant random error but less systematic error or bias than data captured using food frequency questionnaires.65,66,107,108 Food records share characteristics with 24-hour recalls in terms of bias, with the exception of the contribution of reactivity. Screeners cannot be assessed using recovery biomarkers because they do not capture total diet. Findings from biomarker-based studies with adults, including pooled validation studies, have informed recommendations to avoid basing estimates of energy intake on self-report data given known biases.67 However, results for other dietary components, such as protein density and potassium, show that these are less biased,65,66 indicating that self-report data have value for understanding eating patterns more broadly. Validation studies also have informed strategies related to combining instruments109 and the use of appropriate analytic techniques to mitigate error.1,110,111
Measuring diet is complicated when the intent is to examine relationships with body weight or characteristics related to body weight because it has been demonstrated that body mass index is a strong predictor of misreporting of dietary intake, and particularly of energy misreporting.36,103,112 A systematic review of the validity of self-report methods for energy intake in relation to doubly labeled water drawing upon studies conducted with children from birth to age 18 years113 found both under-reporting and over-reporting of intake that ranged from 2 percent to 59 percent, depending on the self-report measure and the study population. The authors concluded that the 24-hour recall, using multiple passes and proxy reporters, was the most accurate method for young children, whereas weighed food records were most accurate for older children. However, as noted, the evidence suggests that, overall, self-report is not the optimal method of assessment of energy intake. Other indicators, such as changes in weight, are a preferred measure of energy balance in relation to other factors, such as interventions.
Because of barriers to the use of recovery biomarkers and the lack of such markers for most dietary components, studies to evaluate the validity of self-report measures often use another error-prone self-report measure that is assumed to be less biased than the measure being evaluated as the reference. Such studies are sometimes referred to as comparative or relative validation studies.1 The findings of these studies need to be interpreted carefully because it is not only true intake but also errors in the measures that can be correlated, possibly leading to misleading results. A number of studies of this nature have been conducted to assess how well self-report tools capture intake among children. For example, parental reports of beverage intakes among infants using a food frequency questionnaire compared to a three-day food record suggest higher correlations for milk than for water, juice/drinks, and soft drinks. Findings for different foods and nutrients tend to vary by dietary component and tool, with both under-reporting and over-reporting possible. A literature review conducted by the National Institutes of Health provides further details of validation studies conducted in various age groups, including infants and toddlers, preschoolers, school-age children, and adolescents.114
It should be noted that most validation studies have been conducted in the context of epidemiology, for example, to better understand the extent to which error attenuates (i.e., biases toward the null) observed relationships between dietary exposures and health or disease outcomes. Research examining error associated with being exposed to an intervention (i.e., intervention-related bias) is lacking. However, existing research (among adults) suggests that this error poses a problem in terms of contributing to differential error between intervention and comparison groups that can affect the results of analyses.115 For example, in a randomized controlled trial, women who had been exposed to messages related to the benefits of fruits and vegetables subsequently reported higher intake of these foods on both frequency questionnaires and a targeted 24-hour recall compared to women not exposed to the potentially biasing messages.116 Research is needed to understand intervention-related biases among children and to inform strategies to address them. Surprisingly little research has been conducted to examine the sensitivity of measures of dietary behavior to change, which is fundamental to assessing the impact of interventions.
Overall highlights from validation and other evaluative studies are outlined in Box 8.
Box 8. Highlights from Studies to Assess the Validity of Dietary Intake Data Collected Using Self-Report Tools
- Validation studies of dietary intake measures using recovery biomarkers as a measure of true intake have illustrated the extent of error for energy and selected nutrients, including protein and potassium.65,66,107,108 It has been well established that energy is reported with substantial error and for this reason, it is recommended that self-report data not be used to estimate absolute energy intake.67
- Misreporting has been shown to be associated with body weight status, with greater under-reporting of energy with higher body mass index. For this reason, it is challenging to assess associations between energy intake and body weight.
- Less is known about misreporting of foods but there is reason to believe that foods and beverages perceived as less healthy may be subject to greater error due to social desirability bias. Research with adults has suggested that under-reporters (based on the use of equations to predict metabolic rate) report foods rich in fat and/or carbohydrates (e.g., sugars, cakes, pastries, French fries) less frequently than do non-under-reporters. Further research is needed to better understand misreporting of particular dietary components.
- Cautions regarding misreporting associated with body weight status must be extended to analyses using variables that may be correlated with body weight, including race/ethnicity, education and other dietary behaviors, such as restrained eating behaviors or body image. For example, interrelationships among body weight and race/ethnicity may make it difficult to assess the contributions of dietary patterns to differential rates of obesity.
In sum, it is critical in all research endeavors to consider the extent to which the measures to be used will provide high-quality data to address the research question. Examining the psychometric properties is a way to consider whether this is the case. The Measures Registry can be very useful in this regard because it provides an overview of available studies on validity and reliability for each included measure. It is important that researchers contemplating the use of a particular measure refer to the original citations to examine how measures were evaluated, in what populations, and the implications for fit for the given study.