FIPS Codes!

To make Crowd Counting Consortium (CCC) data easier to integrate with other datasets commonly used by scholars and journalists studying social and political behavior in the United States, we have added FIPS codes to the version shared on the Nonviolent Action Lab‘s GitHub repository (here).

Federal Information Processing Series (FIPS) codes are numbers that identify geographic areas. They include two-digit codes for states; three-digit codes for counties and county-like entities (e.g., Louisiana parishes, Alaskan boroughs, and independent cities in various states); six-digit codes for census tracts; and four-digit codes for blocks within tracts. As the U.S. Census Bureau explains, these numbers can be combined to generate unique IDs for geographic entities.

FIPS codes for smaller geographic entities are usually unique within larger geographic entities. For example, FIPS state codes are unique within nation and FIPS county codes are unique within state. Since counties nest within states, a full county FIPS code identifies both the state and the nesting county. For example, there are 49 counties in the 50 states ending in the digits “001”. To make these county FIPS codes unique, the state FIPS codes are added to the front of each county (01001, 02001, 04001, etc), where the first two digits refer to the state the county is in and the last three digits refer specifically to the county.

The compiled version of the CCC data now includes a column called fips_code that shows the unique five-digit (state + county) code for the county or county-like entity in which each event occurred (assuming that coders were able to associate the event with a specific locality). These codes make it much faster and cleaner to merge the CCC data or summaries of them with U.S. Census Bureau files and other U.S.-specific resources that already include FIPS codes for this purpose. Faster and cleaner merging, in turn, should make it easier for us and others to study the relationship between protest dynamics, social-structural conditions, and other forms of political behavior (e.g., voting).

If you’d like to see code we’re using to generate these IDs—mostly leaning on the lookup_code() function from the R package tigris, with custom handling of various exceptions—you can find it on the NAL repo, too (here). If you hit any snags or find any errors in this field in the dataset, please open an issue on GitHub to let us know. (And note the advice in the data dictionary on how to handle the likely omission of leading 0s when the data are ingested.)

Happy number-crunching!

Published by Jay Ulfelder

Jay Ulfelder has worked for more than two decades at the intersection of social science and data science, with particular interests in contentious politics, democracy, and forecasting.

%d bloggers like this: