SECTION

7 Collecting and Reporting Data

Several factors should be considered in the planning and implementation of data collection, as well as when analyzing and reporting data. Data collection considerations are detailed in this section, organized by measurement method, and followed by general statistical considerations when working with environmental data. Many users will benefit from consulting an expert in the field of physical activity environment assessment or someone with experience using a specific tool.

GIS-based Measurement Considerations

Several considerations are important when determining whether to use GIS-based measurements in a study or project, and these are described below.

Needed Expertise

Although GIS software such as ArcGIS is user-friendly, the level of data processing and analyses involved in built environment assessments often requires GIS expertise. Fortunately, GIS experts can be found through local universities and government departments. GIS is commonly used in Geography, Public Affairs, City and Transportation Planning, and Health Departments. GIS is used with a wide variety of data, so project staff should make sure a GIS expert is recruited whose skills and interests are relevant to the study or project.

Access to Databases

One challenge of GIS-based environmental assessment is that GIS databases can be difficult to obtain. Local Metropolitan Planning Organization’s (MPO) websites are the best places to start when searching for GIS databases because many MPOs maintain a fairly large collection of GIS data for their region, such as road networks, parcels, and land use shapefiles. Other sources of GIS data include local municipalities, which often will share their GIS databases, sometimes for a fee. GIS data can also be obtained or purchased from sources such as ESRI¹²⁶ and the U.S. Census.¹²⁷ It is also possible to add an original audit or survey data into GIS.

Variable Computations

After determining geographic coordinates (e.g., for building addresses or from GPS) and obtaining the necessary geo-databases, built environment variables need to be created. Creating interpretable and useful built environment variables requires careful thought and multi-step computations involving multiple indicators and intermediary variables. Net residential density, for example, involves dividing the number of residential housing units by the sum land area across all residential parcels. Intersection density involves the sum number of intersections divided by the total land area. Mixed land use variables can be more complicated to compute. The International Physical Activity and Environment Network created a detailed protocol covering computations of numerous GIS-related built environment variables. This protocol serves as an excellent resource for those working on GIS-based built environment assessments and is publicly available online.¹²⁸ A similar resource has been created mainly for application in the United States.^67,129

It is important to note that GIS variables can be computed in various ways and differ across studies. This should be taken into account when comparing findings across studies and selecting computations. Appropriate computations are those with the strongest evidence in relation to physical activity, relevance to the population being studied, and ability to be compared across geographic areas.

Flexibility and Adaptability

It is important to choose a flexible measure in situations where the tool may need to be adapted or tailored to specific populations or settings. New items in audit tools and questionnaires should be pilot tested for inter-rater or test-retest reliability when possible. Tools that use Smartphone or tablet applications or require complex scoring algorithms may be less flexible because of difficulties in making software or scoring modifications. GIS-based measures can be flexible; for example, a variable can be computed multiple ways, and the options will be determined by the datasets available.

Buffer Size, Type, and Origin

GIS-based (e.g., community design) built environment assessments for physical activity have traditionally captured the spatial context around people’s homes. This method involves creating spatial boundaries around people’s homes in GIS to represent the environmental context believed to influence physical activity. The primary approaches to creating the spatial boundaries are: (1) radial/Euclidean buffers, which involve drawing a straight line from the home for a specified distance (e.g., 1 km) and using the line as the radius to create a circle, and (2) street network-based buffers, where a line is drawn a given distance from the home through the street network. Street-network buffers are believed to better represent opportunities for walking and are generally supported in previous studies, though results have been similar across buffer types.^130,131 Buffer size also is important to consider, and the selection may vary based on the population of interest. For example, smaller buffer sizes, around 500 meters, are more appropriate for younger children with limited mobility freedom, whereas larger buffers, around 1 kilometer, better represent an adolescent’s walking space. Although the home neighborhood is almost always the origin of the buffer, evidence is accumulating suggesting that community design aspects outside of one’s home neighborhood, such as the school¹³² and work¹³³ “neighborhoods,” also are important. Some studies have used GPS to determine all of the locations/geographic coordinates a person encounters and calculated GIS-based physical activity environment variables from this information.^134,135 This dynamic approach allows a comprehensive assessment of environmental exposures but may be limited by data availability.

Observation Measurement Considerations

As with GIS-based measures, users interested in working with observation measures should first consider the following issues.

Training

Conducting observational audits can require significant resources and training, particularly for detailed research-oriented tools. Good audit tools are accompanied by a user guide that includes information on training coders. The following training steps should be used when conducting observational audits for research or when high levels of accuracy are required.

Before beginning data collection, each coder should be trained by a “master trainer” who has been trained and certified by the tool developer(s) or another master trainer.
Initial coder training should involve going through the audit tool together in the field (e.g., while walking the route being coded) while the master trainer provides definitions, instructions, and explanations for each selected code.
Coders should be deployed to audit a small number of environments (e.g., five segments or two parks) that the trainer also has audited. Each coder’s data should be compared to the trainer’s gold standard data. Each coder should exhibit a high level of agreement (e.g., >80 percent or 85 percent) with the master trainer before being certified for data collection. Project staff should continue to monitor inter-observer agreement on a small number of environments throughout data collection, then provide feedback and retraining as needed. Ongoing monitoring is needed to ensure high quality of data.

These procedures are ideal for research studies but may not be applicable to practice-based projects, such as those that engage community members in audits for educational purposes or to identify areas needing improvement. Whenever possible, community members should be trained to conduct the audits, and simple certification procedures should be implemented based ideally on agreement with the trainer or at least an assessment of inter-rater reliability.

Selection of Environment Samples for Auditing

Because it is rarely feasible to audit all streets in a community or all parks in a city, a systematic approach to sampling is needed. One approach is to select generalizable samples of the environment/setting of interest when using an audit tool. For example, when using a streetscape audit tool, the user must decide which and how many street segments and/or routes to assess. When using a park audit tool, the user must decide which and how many parks to audit. However, little research is available to guide this selection process.

Another approach is to select samples for a specific purpose. If the goal of the study is to examine the safety of streetscapes around schools, a random sample of schools can be drawn. If the goal is to assess disparities in park quality, then low- and high-income areas can be sampled, and parks within those areas can be assessed. If the goal is to link the built environment to participants’ physical activity, the sample environments can be selected based on proximity to each participant’s home. The environments can also be selected based on the nearest cluster of destinations (e.g., shops, restaurants) to represent the most likely walking path between the participant’s home and these destinations. When the objective is to classify a neighborhood for activity-friendliness, a random sample of environments (e.g., parks, streets) can be selected to represent the neighborhood. Some evidence suggests that assessing between 25 percent and 50 percent of the street segments in the neighborhood may be sufficient to provide a representative estimate of the area.^136,137

Use of Google Street View for Observational Audits

Google Street View is a feature of Google Maps that provides images from views along streets across the world. Several recent studies have investigated whether streetscape audits can be reliably and validity completed using Google Street View.^47,138,139 Each of these studies concluded that Google Street View is acceptable for such tasks. One important consideration when using Google Street View is to be cognizant of the date the images were collected. If the image is old, the environment could have changed, thus limiting the validity of the audit. Images can be taken during different seasons, so if an obstruction (e.g., tree foliage) appears in an image, the user could search for other (newer or older) images of the same area. Google Earth has been available since 2007, so several images have been captured for many areas across the world over this time span. A helpful guide for conducting streetscape audits using Google Street View was developed by Wilson and Kelly.¹⁴⁰

Smartphone and Tablet Applications

Traditionally, observational audits have been completed using pen and paper. More recently, several audit tools have been incorporated into smartphone and tablet computer applications, such as the SOPARC Online App: System for Observing Play and Recreation in Communities (iSOPARC).¹⁴¹ Advantages of applications are that data are recorded directly into the smartphone or tablet, skip-patterns can be automated, data can be transferred and stored on a secure server, and in some instances data can be scored automatically. Applications can be costly to build and maintain, but more are continuing to emerge in environmental assessment.

Photovoice

Photovoice has seen recent use in physical activity environment assessment and allows participants to capture photographs of their environment (e.g., Buman, 2013).¹⁴² The photographs, sometimes paired with written or verbal narratives, can supplement quantitative measures by providing rich qualitative information on specific environment factors relevant to the participant. Photovoice has utility in intervention studies that engage community members. It can also be useful in needs assessment projects that aim to identify and advocate for specific environmental improvements.

Statistical Considerations

In additional to measures-related considerations, users will also need to consider several statistical issues.

Nested Data

Environmental assessment often involves nested (i.e., multi-level) data when multiple participants are included from each environment. Users of environmental assessment tools need to consider the unit of analysis in their data collection and analysis efforts. When personal characteristics are of interest, such as physical activity, the participant is the unit of analysis. For example, when data from a school environment tool (i.e., audit or self- or proxy-report) are compared to physical activity data from multiple children from that school, the design is considered nested. In this circumstance, the sample size and statistical power (i.e., ability to detect a true association) are driven primarily by the number of schools rather than the number of participants. An ideal design would include a large number of schools and a small number of randomly selected students (e.g., 20-30) per school. Assessing more students per school would only minimally improve power. When analyzing nested data, mixed-effects models must be used to account for the non-independence of participants within settings.¹⁴³ Most statistical packages are capable of handling these types of models (e.g., Singer, 1998).¹⁴⁴

Standardized and Unstandardized Regression Coefficients

Unstandardized regression coefficients are interpreted as the change in the dependent variable for every one-unit change in the independent variable. The unit of measurement is retained, so if the dependent variable is minutes of physical activity, for example, the unstandardized coefficient represents minutes of physical activity. For example, an unstandardized regression coefficient of B = -8.6 for the association between distance to the nearest park in kilometers (independent variable) and minutes per day of physical activity (dependent variable) means that for every additional one kilometer in distance to the nearest park, participants had 8.6 fewer minutes per day of physical activity. So a person living one kilometer from a park would have 8.6 fewer minutes per day of physical activity than a person living zero kilometers from a park, and a person living two kilometers from a park would have 17.2 fewer minutes per day of physical activity than a person living zero kilometers from a park. Thus, unstandardized regression coefficients are particularly useful when the unit of measurement is meaningful, such as a quantity of something (e.g., minutes of physical activity, number of parks, number of intersections per acre). Standardized regression coefficients involve standardizing the variances of the dependent and independent variables to one, similar to creating z-scores. Standardized coefficients cannot be interpreted in the unit of measurement of either the dependent or independent variable, but are useful for comparing effect sizes across variables and statistical models. Both types of coefficients add value and are commonly used in environmental assessment, but unstandardized coefficients are most informative to decision making.

Multicollinearity

Independent variables with high correlations should be grouped into scales or indices to prevent multicollinearity, which violates the assumptions of multiple regression analysis. Investigators commonly use a threshold correlation of r = 0.7 or r = 0.8 to denote “high” correlations among independent variables. Highly correlated items can be combined into scales using factor analysis or by summing or averaging values across items. See Section 4 for more information on scales and indices.

Measures Registry USER GUIDES