NOTICE: Due to a lapse in federal funding portions of this website are not being updated. Learn more.

PUMS File Structure

Format of PUMS Files

For the housing unit population, there are two basic record types:

  • the person record
  • the housing unit record

Each record has a unique identifier, i.e., a serial number that links the person to their proper housing unit.

The group quarters population (whether sampled or imputed) has two basic record types as well:

  • the person record
  • the pseudo-housing unit record(i.e., "placeholder" record)

All of a group quarters person's data are included in their person record except the food stamp variables, which are the only data included in their "placeholder" record. A group quarters person's "placeholder" record has zero housing unit weights so it is not counted in housing unit estimates. Learn more about different record types in the Accuracy of the PUMS documentation.

The Census Bureau releases the PUMS in this format because of the tremendous amount of data contained in one record. Although these records are extremely large, they can be handled by most statistical or report-writing software. Each record has an individual weight, which allows users to produce population estimates close to those in other products showing sample data. Each record also includes replicate weights that are used to produce standard errors and to do statistical testing. For more information on using the replicate weights to calculate standard errors or the 90 percent margins of error, view the Accuracy of the PUMS documentation.

Number of Housing Unit Records

Starting with the 2005 PUMS, the number of housing unit records contained in a 1-year PUMS file is about one percent of the total in the nation or approximately 1.3 million housing unit records and about 3 million person records.

The 3-year PUMS files contain records for about three percent of housing units or about three times as many housing units and person records as the 1-year files.

Similarly, the 5-year PUMS files contain about five times as many housing unit and person records as the 1-year files.

Number of Group Quarters Records

There was no group quarters sample in the 2005 ACS. The group quarters sample was added to the ACS in 2006.

From the 2006 PUMS to the 2010 PUMS, the number of group quarters person records contained in a 1-year PUMS file is about one percent of the total population living in group quarters or about 81,000 records.

The 2005-2007 ACS 3-year PUMS files contains only about two times as many records as the 1-year file. For the same reason, the 2005-2009 ACS 5-year PUMS files contain only about four times as many group quarter persons as the ACS 1-year PUMS files.

The ACS 3-year PUMS files (2006-2008, 2007-2009, and 2008-2010) and the 2006-2010 ACS 5-year PUMS file contain records for about three and five times, respectively, as many group quarters persons as the ACS 1-year file.

Starting with the 2011 ACS PUMS, the number of group quarters person records roughly doubled to represent about two percent of the total population. The Census Bureau implemented a new whole person imputation-based methodology to improve the estimates for small areas for the group quarters population. The amount of imputation results in almost twice as many group quarters person records present in the ACS microdata. The new process impacts the PUMS files starting with the 2011 ACS 1-year PUMS, with the number of group quarters person records almost doubling. The responses provided on the PUMS file for the imputed group quarters records are actual answers to questions donated by group quarters sampled records. Learn more about Group Quarters Small Area Estimation.

Concatenating Large "a" and "b" Files

There is a tremendous amount of data in each PUMS record. Although these records are extremely large, they can be handled by most statistical or report-writing software.

Data users should note that PUMS files containing data for the entire United Sates (in contrast to individual state and state-equivalent files) are separated into multiple data files that must be concatenated in order to create a complete file. For example, Users downloading the 2014 ACS 1-year PUMS files of United States Population Records will notice an “a” file and a “b” file. Each file contains about half the population records in the 2014 1-year PUMS dataset of the United States. Below are instructions for concatenating the two PUMS person-level files, in the form of an italicized SAS program and pseudo-code.

Concatenate the two person-level files using the set statement:

data population;

set psam_pusa psam_pusb;

run;

To create a complete housing-level file, replace the set statement with "set psam_husa psam_husb;"

Merging Person and Housing Unit Files

There are many estimates that can be tabulated from the person file and from the household file without any merging. Some data users will need to use household and person items together--for instance, to analyze how the number of rooms in a home varies by a person’s age. This type of analysis will require the merging of the household and person files. This merger must rely on the SERIALNO variable, which is the same in the household and person files. Below are instructions for merging the housing and population PUMS files, in the form of an italicized SAS program and pseudo-code.

1.   First make sure the files are sorted by SERIALNO.

proc sort data=population;

by serialno;

run;

proc sort data=housing;

by serialno;

run;

2.      Then merge the two files together using SERIALNO as a merge key.

data combined;

merge population (in=pop) housing;

/*In SAS, the 'in=' option will allow you to keep only those housing units that have people*/

by serialno;

/*This SAS statement keeps only those housing units that were in the population file*/

if pop;

run;

The suggested merge will create a person level file, so that the estimate of persons can be tallied within categories from the household file and the person weights should be used for such tallies. Please note that housing characteristics cannot be tallied from this merged file without extra steps to ensure that each housing weight is counted only once per household.

You May Be Interested In


Related Topics

Around the Bureau