Uncertainty in Social Security's Long-Term Finances: A Stochastic Analysis

Skip Navigation

Uncertainty in Social Security's Long-Term Finances: A Stochastic Analysis
December 2001

CHAPTER IV

MEASURING UNCERTAINTY ABOUT INPUT ASSUMPTIONS

Given the available historical data for each of the inputs that goes into projections of the Social Security trust funds, actuaries at the Social Security Administration do what sophisticated statisticians do when asked to forecast values for the next 75 years: they assume that future values will follow a pattern consistent with the past. To gauge how sensitive trust fund projections are to the assumed ultimate value for each input, the actuaries use high- and low-cost scenarios, also based on historical data.

That approach does not adequately reflect uncertainty about future trust fund balances, however, for three reasons. First, any average level for an input variable over 75 years is consistent with many possible annual paths (having different fluctuations), with variations in averages over five or 10 years, and with differences in that variable between birth cohorts--all of which can cause variation in the pattern of trust fund accumulations. Second, SSA's actuarial methods do not provide relative measures of the level of uncertainty of each input. Thus, some questions remain unanswered. For instance, is the projection of the unemployment rate or interest rate more uncertain than the projection of fertility? Third, projections by scenario do not incorporate any overall measures of probability for the input scenarios; without such measures, it is impossible to evaluate the likelihood of the results.

Developing better estimates of uncertainty about inputs is essential to developing better estimates of uncertainty about Social Security's finances. To truly measure uncertainty about the future values of inputs--in particular, to estimate probability distributions for annual values--it is appropriate to start (as SSA's actuaries do) with historical data. (The obvious limitation of such data is that future variation may differ from historical variation. To the extent that it does, estimates of uncertainty may be understated or overstated.)

The standard statistical tool for making inferences from historical data is time-series analysis. Such analysis starts by breaking down the historical changes in variables into three main components: annual random shocks that are either positive or negative (but centered around zero), year-to-year correlations in annual values, and random changes in the central tendency of the annual values. Because many variables--such as inflation, unemployment, the real interest rate, real wage growth, and the rate of mortality improvement--seem to have no random change in central tendency over long periods, analysts frequently need to model only the first two sources of change. Whether other variables--including the fertility rate, immigration, and disability incidence or termination--show changes in central tendency depends on how the historical tea leaves are read.

The decision about whether to incorporate random changes in central tendency is important because it dramatically affects conclusions about the possible range (and thus the probability distribution) of future values. In particular, if no random change in central tendency occurs, outcomes will vary within a probability range that is constant over time. For example, the range of possible outcomes for a variable such as inflation in 2075 would probably be the same as the range of outcomes in 2010. Allowing random changes in central tendency, by contrast, suggests that the range of possible outcomes will widen over time. For example, the range of outcomes for fertility in 2075 could be much wider than the range in 2010 because changes in central tendency generally occur gradually. In the short run, the fertility rate is likely to vary around a fairly predictable central tendency; but in the long run, fundamental social changes could affect average fertility.

PRODUCING A FORECAST WITH TIME-SERIES AND MONTE CARLO TECHNIQUES

The time-series analysis employed in this paper uses historical data to project an input's estimated variability (specifically, its probability distribution) around SSA's intermediate projection for that input. The Congressional Budget Office is interested in estimating the uncertainty of SSA's forecast, not in replacing the forecast itself. For the majority of variables, the projections that CBO generated through time-series analysis are quite similar to those used by the Social Security trustees. Most differences arise because the trustees weight recent experience more heavily. In some cases, the projections also differ because the trustees have used expert judgment about the relevance of past values to future trends.

Time-series analysis uses historical data to project both the future values of variables and the variability around those future values. In the first step, the movements in a variable (or group of variables) are broken down mathematically into the three components described above, and the resulting estimated equation (or set of equations) is used to generate expected future values simply by solving it forward through time.

Given an estimated equation, the second step involves employing computer simulation to generate probability distributions for future outcomes. The procedure used to produce those distributions is called Monte Carlo simulation because it involves making repeated random draws, in a mathematically structured way, from the values for annual shocks (much like what occurs in a casino, where, for example, the probability of rolling a particular number on a die is one in six).⁽¹⁾ Those annual random draws are plugged into the time-series equation, which then generates a time path of outcomes for the variable in question. By repeating the simulation process many times, analysts can draw inferences about the probability distribution of future outcomes.

Mathematical Models for Projecting a Time Series

Measuring the uncertainty of input variables using time-series analysis is inherently different from developing intermediate values or high- and low-cost scenarios because the statistical techniques for time-series analysis are designed to infer more than a best guess for the range of the long-term average. At one level, the time-series approach uses a fairly simple equation to explain how variables change from year to year. However, because the string of annual changes adds to long-term changes, the technique actually generates both short-term fluctuations and long-term trends simultaneously.

A casual look at the graphs of the inputs being modeled (Figures 2 through 10 in Chapter II) suggests that there is more than one reason why a variable changes at any point in time. One observation is that most of the inputs being modeled--inflation, unemployment, the real interest rate, real wage growth, and mortality improvement--have no trend change in central tendency over the long term. In that case, mathematical models need only incorporate the first two sources of change: annual random shocks and correlations of annual values over time.

Analysts generally assume that the probability distribution of annual random shocks can be approximated with the well-known "normal" pattern. In that standard approach, the values of random shocks have an expected level, or mean--in this case zero--with a symmetric bell-shaped distribution around that expected level. Thus, analysts are much more likely to draw a random shock that is close to the mean than one that is distant.

If the projected outcomes for a variable composed only of a long-run average and annual random shocks were graphed, all of the values would be centered around the average value for the variable because, by definition, the expected value of the random shocks is zero. In addition, the graph would have several features: approximately the same number of high and low values; more values close to the average than far away, because the distribution of the shocks is normal (bell shaped); and finally--a crucial distinction--no pattern that connects the values over time. (Outcomes in each year would be independent of the outcomes in the previous year.)⁽²⁾

That description of a variable that has only an average value and annual random shocks does not appear to fit any of the inputs that go into projections of Social Security's finances. Rather, all of those inputs (even the ones with apparently stable long-run central tendencies) seem to move in one direction or another and then stay there for long periods--implying high correlation between outcomes from year to year--before moving back. For example, inflation was generally high in the 1940s, fairly low through the early 1970s, generally high for the next decade or so, and then generally low again (see Figure 6 in Chapter II). Clearly, variation occurs from year to year, but the outcomes also seem to be correlated over time.⁽³⁾

How much of a particular change is attributable to random shocks and how much to correlation between values over time? Time-series analysis specifies a simple equation for a variable and allows the data to answer that question. In the simplest specifications, the equation relates the current-period value of a variable to three things: a constant term (the central tendency), the value of the variable during the most recent period (in order to capture the correlation over time), and an error term (the random shock).⁽⁴⁾ More-complicated versions of time-series equations involve adding more lagged terms (not just for the most recent period but for two, three, or more previous periods) or employing a "moving average" of error terms, in which random shocks themselves affect outcomes for more than one period.⁽⁵⁾

How can a user tell if the correct equation was chosen to represent the time series being modeled? The answer is to go back to the premise underlying the equation. If the equation is appropriate, the residuals (error terms) derived from it will have the properties associated with a series of normally distributed random shocks--they will be centered around zero, have the same number of high and low values, have more realizations close to zero than far away, and show no correlation over time. Thus, the time-series approach involves specifying an equation, estimating the parameters with historical data, and testing whether the residuals are consistent with a series of random shocks.⁽⁶⁾

In principle, whether an equation passes that test determines whether unexplained changes in central tendency exist for a variable over time. If an equation for a variable generates residuals that appear to be random shocks, then arguably, no unexplained (random) changes in central tendency exist. All systematic movement in the variable has been captured by the equation, and there is nothing left to explain.⁽⁷⁾

Unfortunately, it is sometimes difficult to tell whether the processes being modeled show random changes in central tendency. The tests used to decide whether derived residuals look like random shocks are not definitive, especially when the time series is short. Thus, the process of deciding whether changes in central tendency have occurred can involve judgment. (As discussed below, CBO chose to use models with and without random changes in central tendency when the evidence seemed unclear.) If the possibility of a nonsystematic changing central tendency is admitted, the simplest approach is to "first difference" the variable in question--that is, to use an equation to describe the change in, rather than the level of, the variable.

Modeling change rather than level for a variable may seem like a trivial difference, but it has a profound effect on inferences about the bands of uncertainty around the variable. When change is modeled, any random shock permanently affects the level of the variable--the shock does not disappear by itself after one period, as in the usual specification. Of course, a random shock in the other direction pushes the level of the variable back in the other direction permanently. Thus, in first-differenced models, the level of a variable at any point in time is the result of cumulative shocks up to that point. Because shocks are all random, any cumulation in one direction pushes the level toward a new central tendency. Thus, uncertainty bands grow over time.

Conclusions about how first-differenced equations differ from level equations result to some extent from how inputs to the model are specified. For example, Figure 5 in Chapter II suggests (and statistical tests confirm) that the growth rate of real wages had no change in central tendency in the past five decades. However, a picture of the average level of real wages (which has risen steadily over time) shows a clear change in central tendency.⁽⁸⁾ The growth rate of real wages is effectively a "differenced" version of the level of real wages--if that level was the input being modeled, tests would indicate changes in central tendency. The same relationship exists between the rate of inflation and the price level or between the change in mortality and the central death rate at a particular time.

Using Time-Series Equations to Generate Annual Probability Distributions

Once analysts have produced mathematical equations for an input, they can generate probability distributions for actual annual outcomes. The simplest time-series models imply that annual values depend only on a constant, on the previous period's value (multiplied by a coefficient), and on an annual random shock. Coefficients are generated when the time-series model is estimated using historical data. The extent to which an input varies around the value predicted by the equation indicates the correct size for annual random shocks. Thus, everything is in place to project future values using computer simulation.

The two methods for using estimated equations to generate probability distributions both involve solving repeatedly for annual outcomes using random draws of annual shocks. The first approach is Monte Carlo simulation, which assumes that random shocks are symmetric and follow some distribution--in this case, a normal, or bell-shaped, pattern. Thus, for each year, the mathematical equation for the normal distribution, together with computer-generated random numbers, can be used to pick values for random shocks.

The second approach is "bootstrap" simulation, so-called because it involves "picking itself up by its own bootstraps." The idea is to use actual residuals generated during the estimation phase to perform the simulation. If 100 data points are used in the estimation, a randomly chosen shock can be selected each year of the simulation, and the probability of drawing any given historical shock is one in 100. The bootstrap approach is a useful alternative to Monte Carlo simulation because it does not require assumptions about the shape of the probability distribution for random shocks.

With Monte Carlo and bootstrap simulations, projecting forward is simple once all of the pieces are in place. Both simulations start with the last actual value, then draw a random value for the annual shock and add that shock to the coefficient multiplied by the last actual value. For the next--second--period, the process is repeated; however, this time the coefficient is multiplied by the outcome for the first simulation period. Thus, annual autocorrelations are built into the projection equation. The process is repeated for each year of the simulation period (in this case, the 75-year projection period used by SSA).

Each Monte Carlo or bootstrap simulation yields a possible set of annual outcomes for the variable in question. To move from that set of annual outcomes to a probability distribution for the outcomes in a given year, the process must be repeated many times. The most likely annual outcomes (those near the central tendency) will be realized in many more of the simulations than unlikely annual outcomes (those far away from the central tendency) will be. That principle serves as the basis for inferring probability distributions--if a given outcome occurs five out of 100 times when the random sequence is generated, the probability of that outcome is 5 percent. If some other value occurs 10 times, it is assigned a probability of 10 percent.

PROBABILITY DISTRIBUTIONS FOR DEMOGRAPHIC AND ECONOMIC INPUTS

Time-series analysis is ideal for assigning probability distributions to future values for the nine inputs used in Social Security's financial calculations. The time-series technique allows analysts to infer the likelihood of all possible outcomes solely on the basis of the historical data. Using the technique involves testing various equation specifications for random changes in central tendency, plus one other structural consideration--some variables are modeled in groups, rather than separately, because there are correlations between the outcomes that are likely to continue in the future.

In this analysis, the only input that is modeled using the simplest time-series specification is real wage growth: its model has only one equation and no correlations with other variables (because productivity is not inherently related to any of the other inputs). Mortality improvement would also be simple if separate rates were not being modeled for 21 different age groups for each sex, and if correlations between the separate age-specific error terms were not required. Unemployment, inflation, and the real interest rate are clearly related in the historical data. Thus, those three inputs are modeled in a joint process in which the outcome for any one variable directly affects the other two.

Immigration, disability incidence, and disability termination are all greatly affected by changes in law or policy. As a result, it does not seem appropriate to say that their evolution involves randomly changing central tendencies. With the time-series approach, however, that is one way in which the data can be interpreted.

In the case of fertility, past patterns suggest that outcomes are highly correlated over time, implying that shocks are temporary but last for several decades. However, a reasonable alternative interpretation is that fundamental (and permanent) changes in the central tendency for fertility have occurred before--at the end of the baby boom, for instance--and thus may occur again.

For this analysis, CBO examined two measures of uncertainty around the expected values for the nine inputs. The first measure is the 5th and 95th percentiles of the values in each year. For example, CBO found that the rate of real wage growth in 2050 was less than -2.38 percent in 5 percent of the simulations and greater than 4.34 percent in another 5 percent of the cases. That range represents the annual variation for that year. The second measure of uncertainty is the 5th and 95th percentiles of the average value over a specific period. For example, CBO computed average wage growth between 2000 and 2050 for each simulation and then looked at the distribution of those averages.

As expected, average values vary less than annual values do. (For instance, though a reasonable chance exists that the economy will be in a depression in any given year, very little chance exists for a five-year depression and even less chance for a 20-year depression.) In the case of real wage growth, CBO found that average growth between 2000 and 2050 was less than 0.13 percent in 5 percent of the simulations and greater than 1.90 percent in another 5 percent. That range is much narrower than the -2.38 percent to 4.34 percent range for the annual variation in 2050.

The graphs that appear in the rest of this chapter illustrate uncertainty for the nine input assumptions. Those graphs include five lines:

The solid line in the center represents SSA's intermediate projection;
The solid lines to either side of it show the 5th and 95th percentiles of annual values for the 1,000 paths generated by the Monte Carlo simulations (suggesting that the outcome in any given year will fall between those bands 90 percent of the time); and
The dotted lines show the 5th and 95th percentiles of the average values (from 2000 through the year in question).

For most of the variables, the range between the 5th and 95th percentile values of the 75-year averages is very similar to the range between SSA's high- and low-cost long-term values (see Table 4). Those ranges are not strictly comparable because SSA's long-term values begin as late as 2025, whereas CBO's values cover all 75 years of the projection period. Still, that similarity is striking considering that SSA's high- and low-cost values have no explicit statistical interpretation.

TABLE 4. RANGES OF UNCERTAINTY FOR INPUTS
Input	SSA's Expected Value	Measures of Long-Term Variation
Input	SSA's Expected Value	SSA^a	CBO^b

Fertility Rate (Number of children per woman)	1.95	0.25	0.36
Rate of Mortality Improvement (Percentage reduction in the mortality rate)	0.68	0.45	0.36
Immigration Level (Thousands of people)	900	278	249
Rate of Real Wage Growth (Percent)	1.00	0.50	0.73
Inflation Rate (Percentage change in the consumer price index)	3.30	1.00	1.36
Unemployment Rate (Percent)	5.50	1.00	0.70
Real Interest Rate on Social Security Assets (Percent)	3.00	0.75	0.48
Disability Incidence Rate (Percent)	0.50^c	0.07^c	0.04
Disability Termination Rate (Percent)	3.83^d	0.77^d	0.42

SOURCES: Social Security Administration; Congressional Budget Office.
a. SSA's variation is half of the difference between the ultimate values for each input in the high- and low-cost scenarios.
b. CBO's variation is half of the difference between the 5th and 95th percentile values for the 2000-2075 average for each input, based on 1,000 Monte Carlo simulations using CBO's Long-Term Actuarial Model.
c. SSA's actuaries set separate rates of change for disability incidence for men and women relative to a base period of 1980 to 1984. The rates reported here (which are adjusted for age and sex) are from CBO's Long-Term Actuarial Model (LTAM), are relative to 2000, and generate disability incidences that match those from SSA.
d. SSA's actuaries set separate rates of change for disability termination by recovery and by death relative to a base period of 1977 to 1980; within the category of termination by death, they set separate rates for men and women. The rates reported here (which are adjusted for age and sex) are from LTAM, are relative to 2000, and generate disability terminations that match those from SSA.

Real Wage Growth

SSA's intermediate assumption for the growth of real wages is 1.0 percent per year over the 75-year projection period. The time-series technique suggests that considerable variation around that value can be expected (see Figure 11).⁽⁹⁾

FIGURE 11.
UNCERTAINTY BANDS FOR THE RATE OF REAL WAGE GROWTH

SOURCES: Department of Commerce, Bureau of Economic Analysis; Social Security Administration, Office of the Chief Actuary; Congressional Budget Office.

NOTE: Annual uncertainty bands show the 90 percent confidence range for a given year. Average uncertainty bands show the 90 percent confidence range for the average of 2000 through a given year.

The equation used to generate paths for real wage growth is the most basic time-series specification. The rate of real wage growth is regressed on a constant and its own lagged value. The resulting error terms pass the test for stable central tendency. Thus, no need exists to explore other specifications. (See the appendix for estimates of the coefficients and values for the various test statistics.)

The 90 percent uncertainty bands for the projection of annual real wage growth cover a range of 4 percentage points in each direction. The range for average values narrows to only about 0.7 percentage points in each direction by 2075, which is the same order of magnitude as SSA's high/low variation of 0.5 percentage points.

Mortality Improvement

SSA projects rates of mortality improvement for both men and women in each of 21 separate age groups. Historical data suggest that the rates of improvement for each sex are somewhat correlated between age groups but that differences in central tendency exist within age/sex groups and should be accounted for. Thus, CBO estimated separate time-series equations for mortality improvement in the 21 age groups of each sex, but the equations were estimated such that correlations in annual random shocks could be accommodated (see the appendix for more details).⁽¹⁰⁾

Like the overall average of rates of mortality improvement, which can be aggregated over age and sex to generate a graph of how mortality is expected to change, uncertainty bands can also be aggregated and graphed (see Figure 12). The 90 percent range of annual outcomes around the expected rate of improvement (0.7 percent per year) is quite large and is consistent with the significant historical variation. As expected, the range of average values is much smaller.

FIGURE 12.
UNCERTAINTY BANDS FOR THE OVERALL RATE OF MORTALITY IMPROVEMENT

SOURCES: Social Security Administration, Office of the Chief Actuary; Congressional Budget Office.

NOTE: Annual uncertainty bands show the 90 percent confidence range for a given year. Average uncertainty bands show the 90 percent confidence range for the average of 2000 through a given year.

As noted earlier, although rates of mortality improvement pass the test for nonrandomness in central tendency, the level of mortality (as represented by the central death rate) can be thought of as a variable with random central tendency. In other words, the input assumption being modeled is already the first-differenced version of a variable with expanding uncertainty ranges. Thus, although the uncertainty bands for the rate of mortality improvement are constant over time, a graph of the bands for central death rates would show increasing uncertainty.

Unemployment, Inflation, and the Real Interest Rate

To understand why unemployment, inflation, and the real interest rate are estimated as a system of equations rather than independently, consider the effects of not doing so. In a model simulation, time-series equations for each variable would adequately generate the historical variability and correlation between annual outcomes for that variable. But the fact that outcomes among the three variables are correlated would not be captured. Thus, that technique could, in principle, violate a basic condition of model simulations and generate a combination of outcomes that would never have occurred historically.

As an example of such correlations, real interest rates were negative in the 1970s because inflation skyrocketed and nominal interest rates did not catch up for some time. Because no other good reason exists for real interest rates to become negative, no analyst would want to predict a negative real interest rate unless the underlying cause was a surge of inflation. Likewise, well-established correlations between inflation and unemployment have been drawn in the annual data; they are generally attributed to fluctuations in aggregate demand. An analyst would not want to accidentally simulate high-frequency positive correlations between inflation and unemployment when the data suggest a strong negative correlation.

The technique that CBO used to simultaneously model the three variables builds directly on the basic time-series approach. But rather than simply regressing a variable on its own lagged value, each equation includes lagged values for all of the variables under consideration.⁽¹¹⁾ Thus, for example, the equation for unemployment includes lagged values for unemployment, the real interest rate, and inflation over the previous two years (see the appendix for more details). The correlations between each variable and its own lagged values are generally positive. The effect of each variable on the other two differs over time but generally reflects well-known properties (such as the short-term negative relationship between inflation and real interest rates described above).

The uncertainty bands for annual values of unemployment, inflation, and the real interest rate are much larger than the bands for average values (see Figures 13, 14, and 15). For annual values, the range for the unemployment rate is 2 to 3 percentage points in each direction, and the ranges for inflation and the real interest rate are about 4 percentage points in each direction. For 75-year average values, the ranges for inflation and unemployment do not differ much from those in SSA's high- and low-cost scenarios (see Table 4). However, CBO's range for the real interest rate is nearly double that of SSA. (The 1999 Technical Panel of the Social Security Advisory Board suggested an even larger range--0.75 percentage points in each direction.)

FIGURE 13.
UNCERTAINTY BANDS FOR THE UNEMPLOYMENT RATE

SOURCES: Department of Labor, Bureau of Labor Statistics; Social Security Administration, Office of the Chief Actuary; Congressional Budget Office.

NOTE: Annual uncertainty bands show the 90 percent confidence range for a given year. Average uncertainty bands show the 90 percent confidence range for the average of 2000 through a given year.

FIGURE 14.
UNCERTAINTY BANDS FOR THE INFLATION RATE

SOURCES: Department of Labor, Bureau of Labor Statistics; Social Security Administration, Office of the Chief Actuary; Congressional Budget Office.

NOTE: Annual uncertainty bands show the 90 percent confidence range for a given year. Average uncertainty bands show the 90 percent confidence range for the average of 2000 through a given year.

FIGURE 15.
UNCERTAINTY BANDS FOR THE REAL INTEREST RATE ON ASSETS IN THE SOCIAL SECURITY TRUST FUNDS

SOURCES: Social Security Administration, Office of the Chief Actuary; Congressional Budget Office.

NOTE: Annual uncertainty bands show the 90 percent confidence range for a given year. Average uncertainty bands show the 90 percent confidence range for the average of 2000 through a given year.

The unemployment rate is the first example in this analysis of a "bounded" input (one that is naturally restricted to a certain range). If the unemployment rate was modeled simply as a level variable, random shocks that led to negative unemployment could be chosen--which is, of course, impossible in reality. Thus, the estimated equation coefficients are based on a "transformed" version of the unemployment rate that is restricted between zero and one (see the appendix).⁽¹²⁾

Immigration and Disability Incidence and Termination

Annual levels of legal immigration and rates at which people join and leave the Disability Insurance program are set directly in law or influenced strongly by policy changes. Historical data for each of those variables show clear indications of changing central tendencies. However, is it appropriate to think of those changes as random when they are determined to some extent by shifts in policy? The answer to that question determines which specification is appropriate for the three variables. The approach that CBO used was to model the processes without random changes in central tendency, so that variation over time is attributed only to random shocks and correlation. To the extent that such variation results from changes in law, it will be overestimated.

Applying the standard time-series approach to those three variables produces significant uncertainty bands (see Figures 16, 17, and 18). The equation for immigration is somewhat more complicated than the standard (one-lag) time-series model because a clear trend in the level of immigration is apparent over time (see the appendix). The wide error bands for both annual and average values for immigration are consistent with the large autocorrelation, which magnifies shocks over time (although the initial shock eventually fades away).

FIGURE 16.
UNCERTAINTY BANDS FOR THE LEVEL OF IMMIGRATION

SOURCES: Social Security Administration, Office of the Chief Actuary; Congressional Budget Office.

NOTES: Annual uncertainty bands show the 90 percent confidence range for a given year. Average uncertainty bands show the 90 percent confidence range for the average of 2000 through a given year.

Recent historical data for the level of immigration are unavailable.

FIGURE 17.
UNCERTAINTY BANDS FOR THE RATE OF DISABILITY INCIDENCE

SOURCES: Social Security Administration, Office of the Chief Actuary; Congressional Budget Office.

NOTES: Annual uncertainty bands show the 90 percent confidence range for a given year. Average uncertainty bands show the 90 percent confidence range for the average of 2000 through a given year.

Recent historical data for the rate of disability incidence are unavailable.

FIGURE 18.
UNCERTAINTY BANDS FOR THE RATE OF DISABILITY TERMINATION

SOURCES: Social Security Administration, Office of the Chief Actuary; Congressional Budget Office.

NOTES: Annual uncertainty bands show the 90 percent confidence range for a given year. Average uncertainty bands show the 90 percent confidence range for the average of 2000 through a given year.

Recent historical data for the rate of disability termination are unavailable.

Measuring uncertainty bands for disability incidence and termination is more difficult because of the limited data--only about 25 years' worth--available for each input. Both equations fail the test for a stable time-series variable; however, because that failure is driven by known policy changes and a short data series, CBO ignored it for this analysis in order to generate fixed error bands. Those bands are quite large--for example, DI incidence roughly doubles between the high and low bands, which leads to a wide variation in estimates of Social Security's finances. (Figures 17 and 18 look different from the other figures because SSA's expected values vary. Thus, the uncertainty bands for average values do not always bracket the expected values.) In addition, the rates of both disability incidence and termination are naturally restricted between zero and one. Thus, like unemployment, those variables were estimated using a bounding transformation.

Fertility

Among the nine inputs, the fertility rate stands out in terms of its complexity and the potential randomness of its central tendency. The historical data support two different methods for modeling fertility: one without random changes in central tendency and the other with random changes. CBO assumed a base-case scenario that includes no changes in central tendency, but it also analyzed an alternative scenario with such changes.

Those two approaches provide different interpretations of the history of U.S. fertility rates since 1940. The base case explains the surge in fertility associated with the baby boom as a series of highly correlated shocks. An approach that assumes random changes in central tendency indicates that the baby-boom era and the post-1964 period have two very different central tendencies.

Fertility can also be modeled using a standard time-series approach that leads to stable error bands (see Figure 19).⁽¹³⁾ The estimated equation involves four lags for past fertility rates and a correlated error (moving-average) term (see the appendix for details). As suggested, the model explains the baby boom as a combination of annual shocks and highly correlated annual outcomes. Thus, the 90 percent range for fertility (roughly 1.0 to 3.0 children per woman) contains most of the data points associated with the baby boom.

FIGURE 19.
UNCERTAINTY BANDS FOR THE OVERALL RATE OF FERTILITY

SOURCES: Social Security Administration, Office of the Chief Actuary; Congressional Budget Office.

NOTE: Annual uncertainty bands show the 90 percent confidence range for a given year. Average uncertainty bands show the 90 percent confidence range for the average of 2000 through a given year.

Allowing random changes in central tendency has a simple effect on the standard time-series disaggregation; however, the implications for the uncertainty bands are profound. In the random-change approach, the variable being modeled is first-differenced--a process akin to assuming perfect correlation in the levels of the variable. That means that every shock in the equation has a "permanent" effect until a shock in the other direction occurs.⁽¹⁴⁾ Thus, the variable tends to meander in one direction or another for long periods.

The error bands for annual values of fertility given a first-differenced specification show the possibility of meandering in either direction in the future (see Figure 20). The 90 percent range is much wider in 2075 than in the early years of the projection. Indeed, that band is narrower than the band of annual values for the standard specification (shown in Figure 19) in the first few decades, but it grows annually by a fixed amount, becoming wider by about 0.5 in either direction.

FIGURE 20.
UNCERTAINTY BANDS FOR THE OVERALL RATE OF FERTILITY USING THE FIRST-DIFFERENCED SPECIFICATION

SOURCES: Social Security Administration, Office of the Chief Actuary; Congressional Budget Office.

NOTE: Annual uncertainty bands show the 90 percent confidence range for a given year. Average uncertainty bands show the 90 percent confidence range for the average of 2000 through a given year.

Even more striking is the difference in the shape of the error bands for average fertility values between the two specifications. For the standard specification, those bands narrow with time (see Figure 19). Thus, less uncertainty exists in the average value for 2000 through 2075 than in the average value for 2000 through 2020, for example. For the first-differenced specification, however, the bands for average values actually widen as shocks permanently affect the annual level of fertility and create scenarios with consistently higher or lower fertility than expected (see Figure 20).

After all of the estimation is complete, a sense of dissatisfaction about fertility (and possibly other inputs) remains. Using statistical techniques to forecast uncertainty in fertility implies the inclusion of a randomly changing central tendency. But how can the current annual fluctuation in the total fertility rate permanently affect all future outcomes? If that were the case, it would suggest that decisions about fertility were determined by random changes in the fertility rate, which is not consistent with any logical or theoretical explanation of the level of fertility.

Another way to view the uncertainty about fertility is to look at other factors that may have caused fluctuations over time. The Depression, World War II, the great postwar economic expansion, the discovery of cheap and effective birth control--all of those events had unpredictable and dramatic effects on the fertility rate. By predicting uncertainty that is consistent with past variation, CBO is implicitly assuming that such events could happen again. Conversely, if the recurrence of those types of events is ruled out, the estimates of uncertainty about fertility presented here are truly an upper bound.

1. A closely related technique is "bootstrap" simulation, in which random draws are made from the collection of actual shocks that occurred in the past rather than from a theoretical distribution of shocks inferred from historical data. Both techniques are used in this analysis.

2. The time-series description of a series made up only of an average value and annual random shocks is "white noise."

3. Inflation is often described as a generalized autoregressive conditional heteroskedasticity (GARCH) process. In that type of process, the variance of errors increases with the level of the variable.

4. The simplest specification for a variable x_t is:

x_t = alpha + beta x_t-1 + epsilon _t

where t denotes time, alpha and beta are parameters to be estimated, and epsilon _t is the residual (unexplained error) at time t. As described in the text, alpha represents the central tendency, and beta captures the correlation of values over time. This type of equation can be estimated with standard regression techniques. Note that the derived residual ( epsilon )--which represents implied random shocks--is used to estimate the variance for the random-shock process that feeds into the Monte Carlo simulation described later in this chapter.

5. In the language of time-series econometrics, a process is described in terms of its "AR" and "MA" properties, with "AR" denoting how many lagged terms are included in the equation and "MA" denoting how long the moving average is for the error terms. The simplest equation is an "AR(1)," which has only one lagged term and no moving average. The most complicated process in the list of Social Security inputs is an "ARMA(4,1)," meaning there are four lagged terms and a single-period moving average of errors.

6. The test for random shocks is based on the Durbin-Watson statistic. See the appendix for details.

7. A time-series econometrician would describe this as a "stationary" series. The standard test for stationarity is based on the augmented Dickey-Fuller statistic. See the appendix for details.

8. In the language of time-series econometrics, the level of real wages is not stationary, but the growth rate of real wages is.

9. The uncertainty bands from bootstrap simulations are very similar. The next chapter presents the bootstrap technique as an alternative to Monte Carlo simulation when solving the entire model, because using actual errors is a direct means for testing the assumption that residuals are distributed normally.

10. The basic concept is that each variable in the system of equations is unaffected by other variables but that the error terms are potentially correlated between the equations. Correlations between errors are measured after every equation in the system is estimated.

11.That technique is known as vector autoregression (VAR).

12. The transformation involves taking the log-odds ratio: if u is the unemployment rate, the variable being modeled is x = log(u/(1-u)). No matter what the shocks to x, the outcome of u is bounded between zero and one.

13. Fertility is naturally bounded from below (the rate cannot drop below zero); however, using a bounding transformation requires setting limits in both directions. The uncertainty bands in Figure 19 are based on an (arbitrary) upward limit of 4.0 children per woman.

14. Building on footnote 4, the specification for a variable x_t is:

(x_t - x_t-1) = alpha + beta (x_t-1 - x_t-2) + epsilon _t

where t denotes time, alpha and beta are parameters to be estimated, and epsilon _t is the residual (unexplained error) at time t. That type of equation can also be estimated with standard regression techniques.

Past 90 Days
Special Collections
By Subject Area/Document Type
Search
New-Document Notification

Search
Browse All Cost Estimates
Stand-Alone PAYGO Tables
Background Info
New-Document Notification

General Information
CBO Fact Sheet
Staffing & Organization
Panel of Economic Advisers
Panel of Health Advisers
Directions to CBO
Visiting CBO

CBO's Role and Work
CBO's Role in the Budget Process
Timeline for Analyses
What CBO Publishes
Preparing and Distributing Estimates and Analyses
Frequently Asked Questions

Who We Are
What We Do
Job Opportunities
Benefits
Meet CBO Analysts
Recruiting Events
Internships
Fellowships
Citizenship Requirements

Address & Contacts
Directions to CBO
Obtaining CBO Publications
Procurement