鶹Ժ

Skip to main content

Do you ever join point data to census polygons or other spatial units containing contextual information? Alek Berg and colleagues have new insight.

Aleksander Berg is a PhD Candidate in the Department of Geography advised by Stefan Leyk. He has been a graduate research assistant since 2023 on a National Institute on Aging grant studying the midlife (aged 25-64) mortality crisis in the United States. A core question of this project is: does place matter? To answer this, Berg has, over the past three years helped develop a dataset of death records at scales as fine as the residence address for fifteen U.S. states, from 1990 to 2022, accounting for over 7 million decedents. These records are then connected to Census spatial units such as blocks containing important contextual information about populations, the economy, and neighborhoods.

A key problem emerged during the development of these data: a substantial number of these death records were implausibly or impossibly joined to census blocks that, for example, are unpopulated, a type of error termed overlay uncertainty. This discovery led to an analysis and .

The paper, entitled: “Why It's So Hard to Match Residence Addresses to Census Blocks—And How to Fix It” leverages a subset of the larger midlife mortality dataset across the six states of California, Colorado, Florida, Massachusetts, Michigan, and New Jersey. Berg and co-authors Myron Gutmann (CU Institute of Behavioral Science), Stefan Leyk (CU Department of Geography), and Hoeyun Kwon (Lehman College, City University of New York): (1) test how often records are joined to the implausible blocks, (2) reveal the systematic problems that may generate these mismatches, and (3) offer ways to fix this problem.

Figure 3 (from the cited manuscript). Chart of the percentage of midlife deaths in zero population blocks in each state per five-year time period from 1990-2022.

Figure 3 (from the cited manuscript). Chart of the percentage of midlife deaths in zero population blocks in each state per five-year time period from 1990-2022.

The study reveals that up to 4% of records are placed into blocks with no population (see Figure 3). Of those erroneously placed death records, around half of them end up in census blocks that make up interstitial spaces such as railroad rights-of-way and boulevards. Berg et al. finds that many of these misplacements are due to very slight errors in the geocoding process (that is matching address text to a point on Earth) that place these death records only a few meters outside of a populated census block. The authors end the manuscript by suggesting a series of corrective measures such as aggregating up to a coarser spatial unit such as a census tract or reallocating records into the nearest populated census block (see Figure 7).

Figure 7 (from the manuscript). Visualizing potential automated corrections for overlay uncertainty.

Figure 7 (from the manuscript). Visualizing potential automated corrections for overlay uncertainty.

Overall, the results and discussion from this paper have important implications beyond the field of health geography due to the vast use of point data derived from addresses (e.g. crime and environmental hazards data) that need to be connected to population data from censuses. As geographers’ datasets become larger and more place specific, paying attention to how small errors are amplified at scale becomes ever more important.