What are Raw Data?
What are Weighted Data?
How were the FHCS Data Weighted?
Why are Data Weighted?
What is the Margin of Error?
Significance of Comparison
What are Raw Data?
The data as
collected from the survey are called raw data as opposed to
weighted data. In raw data, we do not adjust the data for
over-representation or under-representation.
Comparing the distribution of demographic characteristics of the
raw sample data with those of the population will reveal those
subgroups of survey respondents that are under-represented or
over-represented. Examples of these characteristics are: agency,
gender, supervisory status, and minority status.
Back
to Top
What are Weighted Data?
The collected survey
data adjusted to represent the population from which the sample
was drawn is called weighted data. The data might be weighted on
the basis of any demographic characteristics such as gender,
race, supervisory status, age, and so on.
Back
to Top
How were the FHCS Data
Weighted?
Agency/Suborganization
Supervisory Status Non-Supervisor/Supervisor/Executive
Gender: Male/Female
Race and National Origin: Non-minority/Minority
Back
to Top
Why are Data Weighted?
Data are weighted in
order to generalize the findings for the population. A weight
is a ratio. The ratio is between the percentages of data by certain category
in a population and the percentages of the data by the same
category in the sample. To give an example, let us consider that there are
20 percent men and 80 percent women in a population. This is the universe from which we draw a sample for a survey. In the sample data, there are
60 percent men and 40 percent women. We can see that the sample is over-represented in the category for men and consequently under-represented for women. To make a realistic conclusion from the survey sample, we have to adjust the data for this inequity. We do this by dividing the
20 percent (from the population) by 60 percent (from the sample) and get 0.33 as a weight for men. Similarly, dividing
80 percent (from the population) by 40 percent (from the sample) we get 2.0 as the weight for women. Weights are numbers and not percents. By giving a weight of 0.33 to men we are reducing the impact of the responses of men to represent reality. By giving a weight of 2.0 to the data for women we are increasing the impact of the responses of women.
Thus, when we consider gender alone we get 2 weights, considering supervisory status alone we get 3 weights, considering both categories we will have 6 weights (2x3), and adding minority category (2) and agency/organization (190) to the overall data, we will have 2280 different weights for the data.
For the FHCS, we get a percentage distribution of the database by all
subcategories from the population (CPDF). And we also get a percentage distribution by the same
subcategories from the survey population. By dividing the population percentages by the sample percentages we get one weight for a response. This weight is propagated to all records
that conform to those specific sets of demographics. That is, all the records will have weights based on
their demographic characteristics. In the analysis, weights are considered in responses.
The advantage of using the weighted data is that any sample percentages calculated from the data are unbiased estimates of population percentages. Since the study of the population is the main purpose in any survey and we only have data from a sample, we have to adjust the data to generalize about the population, in this case,
governmentwide perceptions.
Back
to Top
What is the Margin of
Error?
Since we are using the weighted sample data to estimate response percentages in the population, we expect to get an estimate approximately equal
to the actual population percentage. The difference between the population percentage and its estimate is the sampling error. In drawing a conclusion from the sample data, there is always a chance that we are making a wrong conclusion because of this sampling error.
When an estimate of a population percentage is stated, we would like to know the quality of this estimate. In order to take this sampling error into account, we compute a confidence interval for each population percentage. This interval gives the confidence that the actual percentage is within plus or minus “x” percent of our estimate. For example, we may have
95 percent confidence that the population percentage is within plus or minus
4 percent of our estimate of 45 percent. Thus, the 95 percent confidence interval would
be from
41 percent to 49 percent. The smaller the margin of error, the better the estimate is. If the margin of error is small, we are confident of the
conclusion; if it is large, we are cautious about the conclusion. We have a large sample and consequently, the margin of error tends to be small. In this survey the
95 percent confidence interval for governmentwide percentages has a margin or error of plus or minus
1 percent. The margin of error for agency percentages is somewhat higher, but is less than plus or minus
5 percent.
Let us take an example from hypothesis testing for those who would like some technical review. Hypothesis tests in this survey compare the responses of two population subgroups to determine if the percentage of employees having positive responses to a questionnaire item is different between the two groups. We test the hypothesis that there is no difference between the two subgroups, versus the alternative that there is a difference. In hypothesis testing, there are two types of error: Type I and Type II. Type I error occurs when we reject a true null hypothesis. Type II error occurs when we accept (fail to reject) a false null hypothesis. For a fixed sample size, when the probability of
Type I error is decreased the probability of Type II error increases. In hypothesis testing we control for
Type I error by setting its probability to a small number such as
5 percent. By assuming that our null hypothesis is true, we
select a threshold value for comparison. The null hypothesis is
rejected if the computed value is greater than the threshold
value at 5 percent probability.
For all the tests performed in the analysis of the survey; the significance level is
5 percent, unless it is specifically mentioned in a given situation.
Back
to Top
Significance of Comparison
In general, we compare either responses from an agency to all responses, or responses from an agency sub-element to the responses from the specific parent-agency. The null hypothesis is that there is no difference between the responses of the selected groups. For example, the difference is obtained by subtracting the government-wide percent positive from the agency percent positive. The difference is statistically significant if there is a less than 5% probability of obtaining a calculated sample difference at least as large in absolute value, given that the actual difference (obtained from a complete census) is zero.
Positively Significant:
The computed difference is significantly higher if there is a less than 2.5% chance of obtaining a positive sample difference as large or larger, given no actual difference.
Negatively Significant:
The computed difference is significantly lower if there is a less than 2.5% chance of obtaining a negative sample difference as large or larger, given no actual difference.
Neutral:
The percentage difference is neutral if neither of the above conditions are satisfied.
Please note: The sample difference required varies according to sample size. The larger the sample size the smaller an observed difference is required to conclude significance.
Back
to Top

|