Back to SBE Nuggets
NSF Funds Research That Counts: Breakthrough in Social Statistics
   

The United States contains over 85,000 local government entities, from townships to airport authorities and school districts. And if the maxim that "all politics is local" holds true, political science has been missing much of political reality. The problem lies in faulty data analysis methods used to make a moderately-educated guess (inference) regarding the behavior of individuals based on the data of group behavior, called aggregate or ecological data.

This "ecological inference problem" was first recognized in 1919, in an effort to calculate how how newly enfranchised women would cast their ballots nationwide. While some statistics were available from prior state elections, the ecological inference problem prevented scholars from distinguishing men's and women's votes within the same electoral precinct. Despite modern statistics, ecological inferences are still required in political science research when individual-level surveys are:

  • unavailable, as in local electoral politics because of the secret ballot
  • unreliable: in racial politics, respondents don't always report opinions
  • accurately
  • insufficient, as in a vast political geography
  • too expensive
  • unfeasible, such as questions relating to historical times

Ecological inferences are also required for other social sciences as well as marketing, education, public policy, geography, history, medicine, statistics and more. Almost all researchers who use aggregate data have encountered some form of the ecological inference problem. They may finally have relief with a groundbreaking solution from Harvard University's political scientist Gary King.

Supported by NSF's Methodology, Measurement and Statistics Program and by the Political Science Program, Dr. King has devised a new statistical model and implemented it in computer software, which also resulted in a Princeton University Press book : A Solution to The Ecological Inference Problem: Reconstructing Individual Behavior From Aggregate Data.

In the history of ecological inference literature, only 49 comparisons exist between estimates from aggregate data and the known data on an individual level - a reflection of the field's focus on hypothesis and theory without economic, sociological or other foundations. In contrast, one of the linchpins of King's methodological model is that it is not merely theoretical, but validated with extensive real-life data. King tested his statistical method with data sets of groups for which individual behaviors were known, making more than 16,000 comparisons from five data sets between his estimates and individual's behavior. For example, estimates of the levels of African-American and Caucasian voter registration were compared to the known answer in public records. The method does not always work, since information is lost in the aggregation process, but King's approach indicates how much information is left in aggregate data to make inferences about individuals. As such, it is possible to learn when the inferences will be relatively certain and when they will not.

In the words of Frank Scioli, director of NSF's Political Science Program, "I expect Gary King's solution will contribute to the production of more accurate, insightful data analysis in a variety of research studies, leading to more informed policy making and better understanding of our economy and society." King's solution to the ecological inference problem can, for example…

  • ...give government researchers and public policy makers the ability to better analyze data and learn about individual behavior in face of massive amounts of statistical information on aggregates such as congressional districts and census blocks. King's method may complement work using national public opinion polls, which require inferences about localities like congressional districts to be based on only a dozen personal interviews.
  • ...offer epidemiologists more information on the spread of illness. For example, they may know general degrees of county-level radon exposure, and the overall number of people in the county with lung cancer, but with King's method they can now specifically know the fraction of individuals with high radon exposure who are diagnosed with lung cancer.
  • ...aid educational researchers, restricted by privacy rules in schools and colleges, to better assess learning trends, student attrition, the value of school choice and other pressing issues.
  • ...revisit historical research. For instance, scholars know where working class people lived in Nazi Germany, and the areas that voted for the Nazi party, but they can now make ecological inferences to learn about the number of working class voters (et. al.) who favored the party.
  • ...help marketing researchers link publicly available data, such as census information, with purchasing statistics and other consumer data to delineate demographic trends and more.
  • ...arm mass-media with more accurate methods of analyzing public opinion polls.

The ecological inference problem arises from attempts to predict individual behavior based on aggregate data of group behavior.

people

The implications of the ecological inference problem typically become apparent in situations involving electoral politics, when attempts to determine the behavior of a specific subset of the voting public at the local level are made based on state or nationwide statistics.

people raising hands

Sample output from Dr. Gary King's program EzI, a Windows-based software program that provides a method for inferring individual attributes from aggregate data that works in practice. It implements the statistical procedures, diagnostics, and graphics from Dr. King's book A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data.



Harvard University's political scientist Dr. Gary King,supported by NSF's Methodology, Measurement and Statistics Program and by the Political Science Program, has devised a new statistical model, in computer software, which also resulted in a Princeton University Press book:

Cover to Dr. King's book, A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data

For more information please see:

Dr. King's web site at: http://gking.harvard.edu/

King, G.(1997). A Solution to The Ecological Inference Problem: Reconstructing Individual Behavior From Aggregate Data. Princeton, NJ:, Princeton University Press.

King, G. (1998) EI: A Program for Ecological Inference (Gauss) and (with Benoit, K.) EzI: A(n Easy) Program for Ecological Inference (Windows 95). This program provides easy-to-use methods of running all the statistical procedures, diagnostics, and graphics developed in A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (Princeton University Press, 1997).
This software is available at Dr. King's website.

This research is supported by the Methodology, Measurement and Statistics Program and by the Political Science Program.

All photos and illustrations are copyright© of their respective owners and may not be used without permission.
| NSF Home | SBE Home | SES Home | NSF Science News | SBE Science Nuggets |