|
NSF PR 97-23 - March 24, 1997
This material is available primarily for archival purposes. Telephone
numbers or other contact information may be out of date; please see current
contact information at media
contacts.
Solution Found to Long-Standing Inconsistencies in
Data Analysis
FINAL EXAM QUESTION: 40% of children in a high school
participated in a special college preparation program,
and 40% of students from that high school went on
to college. For 50 bonus points, what fraction of
participants in the college prep program went on to
college?
This is a trick question. Until now, the only way
to be sure of the answer would be to violate confidentiality
laws and track down the individual students.
Now, a National Science Foundation (NSF)-supported
political scientist has a solution to a long-standing,
consequential problem in social science methodology:
how to learn about the behavior of individuals when
the only information available is on groups.
The solution may have been found in a statistical
method developed by Gary King, professor of government
at Harvard University. His new algorithm for computer
software is reported in a recently published book
by Princeton University Press, A Solution To The
Ecological Inference Problem: Reconstructing Individual
Behavior From Aggregate Data.
King's new method may have a significant impact on
a range of research problems, such as epidemiological
studies of radon and lung cancer, market research
on consumer behavior and implementation of the Voting
Rights Act. The American Political Science Association
has selected King to receive its Gosnell Award "for
the best methodological work in political science
in 1995-96" for his research on this subject.
"I expect Gary King's solution will contribute to
the production of more accurate, insightful data analysis
in a variety of research studies, leading to more
informed policy-making and better understanding of
our economy and society," Frank Scioli, director of
NSF's political science research program, says.
Inferring individual behavior from statistics recorded
about groups, known as the "ecological inference problem,"
was originally posed over 75 years ago. It was the
first statistical problem encountered in the new field
of political science. Scholars soon recognized the
same problem in numerous other areas, and since then
researchers have pursued a solution.
"Ecological inference is required whenever surveys
are unavailable, unreliable or too expensive," says
King. "Surveys cannot address most historical questions
unless they are conducted then and there. They are
also unreliable for studying controversial issues,
such as racial politics, since respondents do not
always report their opinions and behaviors accurately."
The ecological inference problem was originally raised
in 1919 by scholars seeking to know how women, who
were about to have the vote nationwide, would decide
to cast their ballots. Although women had voted in
some state elections, and these data were available,
the secret ballot and the ecological inference problem
prevented analysts from distinguishing the votes of
women from the remaining (male) votes in the same
electoral precincts.
The United States and other governments produce enormous
quantities of statistical data on aggregates such
as towns, cities, congressional districts and census
blocks. A solution to the ecological inference problem
will give researchers and public policy makers the
ability to better analyze data and learn about individual
behavior.
King tested his method with data sets of groups for
which the individual behaviors were known. He made
more than 16,000 comparisons between his estimates
and the known individuals' behaviors. NSF provided
the support to gather the data and to develop methods
for its analysis.
Attachment: Applications
of the Ecological Inference Solution
Attachment
Applications of the Ecological Inference Solution
Several research areas may benefit from the ecological
inference solution developed by Gary King, professor
of government at Harvard University, with the support
of the National Science Foundation.
For more information on his research, see
http://gking.harvard.edu.
- In marketing, researchers know the fraction of
married people in each zip code area (from census
data), and the number of refrigerators purchased,
but need to know what fraction of married people
purchase refrigerators.
- In epidemiology, information is available at the
county level on degrees of radon exposure, and
the number of people who have lung cancer, but
researchers need to know the fraction of individuals
with high radon exposure who are diagnosed with
lung cancer.
- In historical research, it is known where working
class people lived in Nazi Germany, and the areas
that voted for the Nazi party, but scholars need
to make ecological inferences to learn about the
fraction of working class voters (and others)
who voted for the Nazis.
- In education, researchers who wish to assess the
value of school choice programs have measures
at the school level, such as the dropout rate
or the percent who attend college, as well as
on the proportion of each private school's students
who paid with a voucher. Because of privacy rules,
researchers must make ecological inferences to
learn about the fraction of voucher students who
attend college, or the fraction of non-voucher
students who drop out.
- Ecological inferences are required in several
areas of public policy, such as implementing the
Voting Rights Act, where courts have required
estimates of the degree to which minority groups
vote differently than whites.
- Elected officials need to make ecological inferences
when they attempt to determine the policy preferences
of different groups of their constituents.
|
|