SUSPENDED-SEDIMENT DATABASE
Daily Values of Suspended Sediment and Ancillary Data

Table of Contents


BIAS CORRECTION

Retransformation methods in regression models used to estimate sediment loads

Estimates of suspended-sediment loads are often derived from periodic data using regression models. Many of the regression models used involve transformation into logarithmic space but final results are often required to be in original engineering units; therefore, retransformation of load data is needed. This retransformation involves a "bias correction problem" that has received much attention.

This report provides examples of three methods of handling bias correction; (1) the Quasi-Maximum Likelihood Estimator (QMLE), (2) the Minimum Variance Unbiased Estimator (MVUE), and (3) the Smearing Estimator (SM). Two of these methods are recommended by Cohn and Gilroy (1991). These two methods are the Minimum Variance Unbiased Estimator (MVUE) for use when the distribution of errors is assumed normal and the Smearing Estimator (SM) for situations in which non-normal error distribution is identified. Although the focus here is on an appropriate bias correction factor, it is well worth emphasizing that miss-specification of the appropriate regression model in a particular situation can yield sizable errors and render any care taken in correcting for bias as a useless exercise.

EXAMPLE PROBLEM FOR BIAS CORRECTION

Three methods of computing the bias correction factor are shown in this example problem. Attempts are made to conform to symbols and nomenclature used by Cohn and Gilroy, 1991.

At a gaging station instantaneous streamflow and sediment data were collected (Table 1). Streamflow and sediment discharge are reported in metric units and natural logarithms are used throughout the example.

Table 1: Instantaneous values of streamflow (Q) and sediment discharge 
		obtained from a sampling program. 
----------------------------------------------------------------------
Obser-  Streamflow   Sediment       Natural       lnQ - mean lnQ 
vation   (Q)         Discharge        Log    
        (m3/s)       (metric tons)   of (Q) 
----------------------------------------------------------------------
1       125.44       12698.0        4.83183         1.09279   
2       70.0         1287.9         4.24850         0.50946   
3       68.86        2575.9         4.23208         0.49304   
4       27.86        197.7          3.32719         -0.41185  
5       41.44        1623.5         3.72425         -0.01479  
6       13.38        25.4           2.59376         -1.14528  
7       24.92        87.1           3.21567         -0.52337  
                                                              
                                                              
----------------------------------------------------------------------
                               Equation = 3.7390     Equation=3.45225

= QBAR = QVAR ----------------------------------------------------------------------

A log-log regression between the instantaneous values of sediment discharge and streamflow is applied to the data in Table 1 and is shown in Figure 1. A considerable amount of analysis has convinced us that this is the correct model and the equation for the above data is:

Equation (1)

where: Equation = instantaneous sediment discharge or load estimated

from the transport (rating) curve in metric tons per day

Q = instantaneous streamflow discharge in m3/sec

0.020 = the exponentiated intercept (-3.901).

The mean square error of the regression (s2) is 0.3465.

Equation

Figure 1. -- The relation between sediment discharge and streamflow for the example.

From the daily streamflow record at this gaging station we want to compute sediment loads for a 14 day period. Although there are several ways to accomplish this task, we will use equation 1 to compute a daily sediment load and sum these daily loads for the 14 day period. In doing this we have assumed the relation between instantaneous and daily values are equivalent. This is not always the case. Table 2 shows the 14 day series of daily streamflow and the predicted daily sediment discharge computed from eq. 1.

 

Table 2: Daily values of streamflow and predicted daily sediment discharge for 
		a 14 day study period. 
---------------------------------------------
                  Daily mean    Daily Sediment
                  Discharge     Discharge
Obs  Month   Day  (m3/sec)      (metric tons/   
                                 day)            
---------------------------------------------
1      1      21   13.44       26.03           
2      1      22   17.64       55.13           
3      1      23   40.60       550.25          
4      1      24   26.88       176.30          
5      1      25   22.18       103.73          
6      1      26   21.59       96.29           
7      1      27   26.88       176.30          
8      1      28   43.68       673.30          
9      1      29   60.20       1631.99         
10     1      30   37.52       442.58          
11     1      31   27.86       194.62          
12     2      1    25.90       159.12          
13     2      2    24.92       143.05          
14     2      3    24.92       143.05          
---------------------------------------------
Total  ---    ---   ---        4571.74         
                                               
---------------------------------------------

1) The Quasi-Maximum Likelihood Estimator (QMLE)

Having recognized the bias of the rating curve method, Ferguson (1986) recommends:


Equation (2)

where: Equation = estimated sediment discharge (load) using the quasi-maximum likelihood estimator (QMLE)

LRC = Sediment load estimated from the rating curve

s2 = mean square error of the regression

Therefore, in the example the total load computed from the regression (LRC) for the 14 day period is 4,571.74 metric tons and the mean square error of the regression (s2) is 0.3465. Inserting into equation 2 yields:

Equation (3)


= 5436.6 metric tons

Although used extensively, this method is generally not recommended because the results are not unbiased estimates and the method often over-corrects for the downward bias of the rating curve (Cohn and Gilroy, 1991, p.48). Methods #2 and #3 described below are suggested for use by Cohn and Gilroy, 1991. This QMLE method may be suitable if the predicted discharges are within the interval of a fairly large calibration data set and the sample mean square error (s2) is a satisfactory estimator for the population mean square error.

2) The Smearing Estimator (Duan, 1983)

The Smearing estimator is a nonparametric method which is based on the equation:

Equation (4)
where: Equation = estimated sediment discharge (load) using the smearing estimator

ei = residuals from least squares regression from Eq. 1 and are the differences in the natural logarithms of measured and computed sediment discharge.

In the example there are seven observations in the regression equation (Eq. 1). The residuals are shown in Table 3.

Table 3: Residuals from the example problem regression equation. 
---------------------------------------------------------
             Natural Log of
      ------Sediment Discharge---------|-----Residual----
Obs    Observed              Predicted     ei      exp(ei)  
---------------------------------------------------------
1      9.4492                9.4238     0.0254    1.0257   
2      7.1608                7.8138     -0.6531   0.5205   
3      7.8540                7.7685     0.0854    1.0892   
4      5.2868                5.2710     0.0157    1.0158   
5      7.3923                6.3669     1.0254    2.7883   
6      3.2347                3.2468     -0.0120   0.9881   
7      4.4671                4.9632     -0.4961   0.6089   
Total  ---                   ---        ---       8.0365   
                                                           
---------------------------------------------------------
______________________________________________________________________

Equation 4 is:

Equation (5)

= 5248.7 metric tons

This is a nonparametric method that only requires the assumption that the residuals are independent and identically distributed (homoscedastic) but they can follow any distribution. When residuals are normally distributed it performs nearly as well as MVUE (example no. 3). Also, this estimator can be generalized to transformations other than the logarithm. For example, if the transformation function is the square root or inverse then the function used is the inverse of the selected transformation -- the square or inverse.

3) The Minimum Variance Unbiased Estimator (MVUE) also called the Bradu-

Mundlak Estimator

To apply this method, the bias correction is applied to each daily discharge (Q*) for the 14 day period using the following expression:

Equation (6)

and:

Equation

where: Equation = estimated sediment discharge (load) for the day using the minimum variance unbiased estimator

LRC(t) = Sediment load estimated from the transport (rating) curve for each day (t)

gm = a function introduced by Finney (1951) and used by Bradu and Mundlak (1970). This function can be evaluated by the computer program (Cohn and others, 1989, p. 939) in this Appendix.

m = degrees of freedom of the regression equation

V = an estimate of the variability at a given value of

streamflow discharge and computed by eq. 7

Computation of V is:

Equation (7)

where: Q* = daily mean streamflow for the day loads are being predicted

Q = instantaneous streamflow used in the regression (logarithms are shown in Table 1)

N = number of data points in the regression

The minimum variance unbiased estimator (MVUE) can be evaluated using the Fortran program attached to the end of this example. This computer program evaluates Equation 6 and 7 along with the expression gm. The computer program requires an input file called MVUE.IN containing a single column of streamflow discharges (column 4 of Table 2) for which the predicted daily sediment discharge needs to be corrected for bias. The output for the program is a file, MVUE.OUT, which contains the bias corrected daily sediment discharges and the total sediment discharge for the period. Output for the example program is shown in Table 4. The program requires input variables including XN, QBAR, QVAR, XM, S2, A, and B. These variables are defined in the program and values from the example problem for these variables are included in the program as default values. Different values can be added from the terminal and the default values can be changed by editing the program and re-compiling.

Table 4: Output for MVUE from computer program 
------------------------------------------------------
       Daily 
         Q       V     Argument*     gm    LMVUE                                                         
Obs    m3/s    (eq.7)   of gm()            (eq.6)
------------------------------------------------------
1      13.44   0.5198   0.0998     1.0857  28.          
2      17.64   0.3615   0.1327     1.1151  61.          
3      40.60   0.1432   0.1781     1.1565  636.         
4      26.88   0.2009   0.1661     1.1455  202.         
5      22.18   0.2614   0.1535     1.1339  118.         
6      21.59   0.2716   0.1514     1.1320  109.         
7      26.88   0.2009   0.1661     1.1455  202.         
8      43.68   0.1433   0.1781     1.1565  779.         
9      60.20   0.1801   0.1705     1.1494  1876.        
10     37.52   0.1466   0.1774     1.1559  512.         
11     27.86   0.1920   0.1680     1.1472  223.         
12     25.90   0.2109   0.1640     1.1436  182.         
13     24.92   0.2222   0.1617     1.1414  163.         
14     24.92   0.2222   0.1617     1.1414  163.         
------------------------------------------------------
Total  ---     ---      ---        ---     5254.        
                                           metric tons  
------------------------------------------------------

* the expression Equation from eq. (6)

______________________________________________________________________

This is a parametric method that provides an unbiased estimate. Note that the calculation for bias correction are done on each daily value. Therefore, this method is particularly helpful if there is a large variance in daily discharges, if the prediction interval changes greatly over the range of data, or if many of the predictions are made near the extremes of the relation.

SUMMARY

Although the results can vary widely among data sets, it is interesting to compare the computed sediment discharges from the various methods for the example problem. Note that the sediment discharge computed with no bias correction is 13 percent smaller than the estimate using MVUE (method 3). The QMLE (method 1) estimate is 4 percent larger than the MVUE estimate. The MVUE and the Smearing estimate (method 2) differ by 0.6 percent. The direction of these differences appear typical; however, the magnitude of the differences could vary widely.

REFERENCES

Bradu, D., and Mundlak, Y., 1970, Estimation in lognormal linear models: J. Am. Stat. Assoc., v. 65, no. 329, p. 198-211.
Cohn, T.A., DeLong, L.L., Gilroy, E.J., Hirsch, R.M., and Wells, D.K., 1989, Estimating constituent loads: Water Resources Research, v. 25, no. 5, p. 937-942.
Cohn, T.A., and Gilroy, E.J., 1991, Estimating loads from periodic records: U.S. Geological Survey Branch of Systems Analysis Technical Memo 91.01, 81 p.
Duan, N., 1983, Smearing estimate -- a nonparametric retransformation method: Journal of the American Statistical Association, 78, 383, p. 605-610
Ferguson, R.I., 1986, River loads underestimated by rating curves: Water Resources Research, 22, 1, p. 74-76.
Finney, D.J., 1941, On the distribution of a variate whose logarithm is normally distributed: J. R. Stat. Soc. Suppl., 7, p. 155-161.

APPENDIX

Fortran Program Used to Compute Equations 6 and 7.


_____________________________________________________________________________________

C     This program computes the bias corrected load using MVUE
C
      INTEGER I
      REAL A,AA,ARG,B,QBAR,QSAVE,QSTAR,QVAR,S2,TEMP,UE,V,XM,XN
      WRITE(*,*)'This program reads the daily discharges from a file
     #called MVUE.IN'
      WRITE(*,*)'and writes the bias corrected loads to a file called
     #MVUE.OUT.'
      WRITE(*,*)'     The discharges in MVUE.IN can be in any format
     # but must be in the'
      WRITE(*,*)'first column.'
      WRITE(*,*)' '
      WRITE(*,*)'Enter a 1 to continue or any letter to quit. '
      READ(*,'(I5)',ERR=99)I
      IF(I.NE.1)GO TO 99
      OPEN (15,FILE='MVUE.IN',STATUS='OLD')
      OPEN (16,FILE='MVUE.OUT')
C  INPUT VARIABLES
C     XN   = Number of data pairs
C     QBAR = Mean of ln Q
C     QVAR = Sum of ((LN Q - LN Q MEAN)**2)
C     XM   = Number of degrees of freedom of residuals
C     S2   = Mean square error
C     A    = Exponentiated intercept of sediment load - discharge 
C            relation
C     B    = Slope of sediment load - discharge relation
      XN = 7.
      QBAR = 3.7390
      QVAR = 3.45225
      XM = 5.
      S2 = 0.3465
      A = 0.020
      B = 2.76
C      Enter updated data
      WRITE(16,*)'Output of the MVUE program'
      WRITE(*,*)'Default (Number of data pairs) XN = ',XN
      WRITE(*,*)'Enter n to select default or enter new value- '
      READ(*,'(F9.0)',ERR=10)AA
      XN=AA
   10 WRITE(16,*)'  XN   (Number of data pairs)               = ',XN
      WRITE(*,*)'Default (Mean of ln Q) QBAR = ',QBAR
      WRITE(*,*)'Enter n to select default or enter new value- '
      READ(*,'(F9.0)',ERR=11)AA
      QBAR=AA
   11 WRITE(16,*)'  QBAR (Mean of ln Q)                       = ',QBAR
      WRITE(*,*)'Default [Sum of ((LN Q - LN Q MEAN)**2)] QVAR 
     # = `, QVAR
      WRITE(*,*)'Enter n to select default or enter new value- '
      READ(*,'(F9.0)',ERR=12)AA
      QVAR=AA
   12 WRITE(16,*)'  QVAR [Sum of ((LN Q - LN Q MEAN)**2)]     = ',QVAR
      WRITE(*,*)'Default (No of degrees of freedom) XM = ',XM
      WRITE(*,*)'Enter n to select default or enter new value- '
      READ(*,'(F9.0)',ERR=13)AA
      XM=AA
   13 WRITE(16,*)'  XM   (No of degrees of freedom)           = ',XM
      WRITE(*,*)'Default (Mean square error) S2 = ',S2
      WRITE(*,*)'Enter n to select default or enter new value- '
      READ(*,'(F9.0)',ERR=14)AA
      S2=AA
   14 WRITE(16,*)'  S2   (Mean square error)                  = ',S2
      WRITE(*,*)'Default (Intercept of Qs-Q relation) A = ',A
      WRITE(*,*)'Enter n to select default or enter new value- '
      READ(*,'(F9.0)',ERR=15)AA
      A=AA
   15 WRITE(16,*)'  A    (Intercept of Qs-Q relation)         = ',A
      WRITE(*,*)'Default (Slope of Qs-Q relation) B = ',B
      WRITE(*,*)'Enter n to select default or enter new value- '
      READ(*,'(F9.0)',ERR=16)AA
      B=AA
   16 WRITE(16,*)'  B    (Slope of Qs-Q relation)             = ',B
      QAVE=0.0
C  *********************
C  READ QSTAR (DISCHARGES AT WHICH LOADS ARE PREDICTED)
      WRITE(16,*)' '
      WRITE(16,*)' Obs  Daily      V eq 5     Arguement       gm
     #  L mvue'
      WRITE(16,*)' '
      I=0
   20 READ(15,*,END=30) QSTAR
        V=(1./XN + (((LOG(QSTAR)-QBAR)**2)/QVAR))
        ARG=((XM+1)/(2.*XM))*((1.- V)*S2)
        TEMP = GM(XM,ARG)
        UE=(A*(QSTAR**B))*TEMP
        I=I+1
        QAVE=QAVE+UE
        WRITE(16,901) I,QSTAR,V,ARG,TEMP,UE
  901 FORMAT(I5,G12.4,3G12.5,F12.0)
      GO TO 20
   30 CONTINUE
      WRITE(16,900)QAVE
  900 FORMAT(43X,'Total',3X,F14.0)
   99 CONTINUE
      END
      FUNCTION GM(XM,ARG)
C  ********
C     FUNCTION TO COMPUTE FINNEY'S GM(T)
C
C     AUTHOR.........TIM COHN
C     DATE...........OCTOBER 1, 1986
C
C     XM    R*4    NUMBER OF DEGREES OF FREEDOM OF RESIDUALS
C     ARG   R*4    ARGUMENT TO FINNEY'S FUNCTION
C
      DATA TOL/1.E-7/
      IF (ABS(ARG) .GT. 50.0) THEN
          PRINT *,' MAGNITUDE OF ARG IS TOO LARGE (GM)'
          GM = 0.0
          RETURN
      ENDIF
         GM = 1.0
         IF(XM .LE. 0.0) RETURN
         BT = ARG*XM**2/(2.0*(XM+1.0))
         TERM = 1.0
      DO 10 P=1,1000
         TERM = TERM * BT/((XM/2.0+P-1.0)*P)
         GM = GM+TERM
         IF(P .GT. 1.0 .AND. ABS(TERM) .LT. TOL) RETURN
   10 CONTINUE
          PRINT *, 'GM DID NOT CONVERGE'
          RETURN
      END
 
For comments or questions, send mail to: jrgray@usgs.gov