<
 
 
 
 
×
>
hide
You are viewing a Web site, archived on 03:22:06 Oct 16, 2004. It is now a Federal record managed by the National Archives and Records Administration.
External links, forms, and search boxes may not function within this collection.
OPM Seal. Red check mark in box.
Federal Human Capital Survey 2002, text image. Red check mark in box.
About the Data, white text on U.S. flag background.
. .
FHCS Home | What is the FHCS | About the Data | Getting Started | FHCS Analysis Hints | Reports | Published Reports 
Red Star Bullet What are Raw Data?
Red Star Bullet What are Weighted Data?
Red Star Bullet How were the FHCS Data Weighted?
Red Star Bullet Why are Data Weighted?
Red Star Bullet What is the Margin of Error?
Red Star Bullet Significance of Comparison

What are Raw Data?

The data as collected from the survey are called raw data as opposed to weighted data. In raw data, we do not adjust the data for over-representation or under-representation.

Comparing the distribution of demographic characteristics of the raw sample data with those of the population will reveal those subgroups of survey respondents that are under-represented or over-represented. Examples of these characteristics are: agency, gender, supervisory status, and minority status.

Back to Top

What are Weighted Data?

The collected survey data adjusted to represent the population from which the sample was drawn is called weighted data. The data might be weighted on the basis of any demographic characteristics such as gender, race, supervisory status, age, and so on.

Back to Top

How were the FHCS Data Weighted?

Blue Star Sub-bullet Agency/Suborganization
Blue Star Sub-bullet Supervisory Status Non-Supervisor/Supervisor/Executive
Blue Star Sub-bullet Gender: Male/Female
Blue Star Sub-bullet Race and National Origin: Non-minority/Minority

Back to Top

Why are Data Weighted?

Data are weighted in order to generalize the findings for the population. A weight is a ratio. The ratio is between the percentages of data by certain category in a population and the percentages of the data by the same category in the sample. To give an example, let us consider that there are 20 percent men and 80 percent women in a population. This is the universe from which we draw a sample for a survey. In the sample data, there are 60 percent men and 40 percent women. We can see that the sample is over-represented in the category for men and consequently under-represented for women. To make a realistic conclusion from the survey sample, we have to adjust the data for this inequity. We do this by dividing the 20 percent (from the population) by 60 percent (from the sample) and get 0.33 as a weight for men. Similarly, dividing 80 percent (from the population) by 40 percent (from the sample) we get 2.0 as the weight for women. Weights are numbers and not percents. By giving a weight of 0.33 to men we are reducing the impact of the responses of men to represent reality. By giving a weight of 2.0 to the data for women we are increasing the impact of the responses of women.

Thus, when we consider gender alone we get 2 weights, considering supervisory status alone we get 3 weights, considering both categories we will have 6 weights (2x3), and adding minority category (2) and agency/organization (190) to the overall data, we will have 2280 different weights for the data. 

For the FHCS, we get a percentage distribution of the database by all subcategories from the population (CPDF). And we also get a percentage distribution by the same subcategories from the survey population. By dividing the population percentages by the sample percentages we get one weight for a response. This weight is propagated to all records that conform to those specific sets of demographics. That is, all the records will have weights based on their demographic characteristics. In the analysis, weights are considered in responses.

The advantage of using the weighted data is that any sample percentages calculated from the data are unbiased estimates of population percentages. Since the study of the population is the main purpose in any survey and we only have data from a sample, we have to adjust the data to generalize about the population, in this case, governmentwide perceptions.

Back to Top

What is the Margin of Error?

Since we are using the weighted sample data to estimate response percentages in the population, we expect to get an estimate approximately equal to the actual population percentage. The difference between the population percentage and its estimate is the sampling error. In drawing a conclusion from the sample data, there is always a chance that we are making a wrong conclusion because of this sampling error.

When an estimate of a population percentage is stated, we would like to know the quality of this estimate. In order to take this sampling error into account, we compute a confidence interval for each population percentage. This interval gives the confidence that the actual percentage is within plus or minus “x” percent of our estimate. For example, we may have 95 percent confidence that the population percentage is within plus or minus 4 percent of our estimate of 45 percent. Thus, the 95 percent confidence interval would be from 41 percent to 49 percent. The smaller the margin of error, the better the estimate is. If the margin of error is small, we are confident of the conclusion; if it is large, we are cautious about the conclusion. We have a large sample and consequently, the margin of error tends to be small. In this survey the 95 percent confidence interval for governmentwide percentages has a margin or error of plus or minus 1 percent. The margin of error for agency percentages is somewhat higher, but is less than plus or minus 5 percent.

Let us take an example from hypothesis testing for those who would like some technical review. Hypothesis tests in this survey compare the responses of two population subgroups to determine if the percentage of employees having positive responses to a questionnaire item is different between the two groups. We test the hypothesis that there is no difference between the two subgroups, versus the alternative that there is a difference. In hypothesis testing, there are two types of error: Type I and Type II. Type I error occurs when we reject a true null hypothesis. Type II error occurs when we accept (fail to reject) a false null hypothesis. For a fixed sample size, when the probability of Type I error is decreased the probability of Type II error increases. In hypothesis testing we control for Type I error by setting its probability to a small number such as 5 percent. By assuming that our null hypothesis is true, we select a threshold value for comparison. The null hypothesis is rejected if the computed value is greater than the threshold value at 5 percent probability.

For all the tests performed in the analysis of the survey; the significance level is 5 percent, unless it is specifically mentioned in a given situation.

Back to Top

Significance of Comparison

In general, we compare either responses from an agency to all responses, or responses from an agency sub-element to the responses from the specific parent-agency. The null hypothesis is that there is no difference between the responses of the selected groups. For example, the difference is obtained by subtracting the government-wide percent positive from the agency percent positive. The difference is statistically significant if there is a less than 5% probability of obtaining a calculated sample difference at least as large in absolute value, given that the actual difference (obtained from a complete census) is zero. 

Positively Significant:

The computed difference is significantly higher if there is a less than 2.5% chance of obtaining a positive sample difference as large or larger, given no actual difference.

Negatively Significant:

The computed difference is significantly lower if there is a less than 2.5% chance of obtaining a negative sample difference as large or larger, given no actual difference.

Neutral:

The percentage difference is neutral if neither of the above conditions are satisfied.

Please note: The sample difference required varies according to sample size. The larger the sample size the smaller an observed difference is required to conclude significance.

Back to Top

Back, active text link

FAQs, rollover link. Feedback, rollover link. News Media, rollover link. Contact Us, rollover link.
Privacy Policy

Accessibility Statement