This report provides examples of three methods of handling bias correction; (1) the Quasi-Maximum Likelihood Estimator (QMLE), (2) the Minimum Variance Unbiased Estimator (MVUE), and (3) the Smearing Estimator (SM). Two of these methods are recommended by Cohn and Gilroy (1991). These two methods are the Minimum Variance Unbiased Estimator (MVUE) for use when the distribution of errors is assumed normal and the Smearing Estimator (SM) for situations in which non-normal error distribution is identified. Although the focus here is on an appropriate bias correction factor, it is well worth emphasizing that miss-specification of the appropriate regression model in a particular situation can yield sizable errors and render any care taken in correcting for bias as a useless exercise.
At a gaging station instantaneous streamflow and sediment data were collected (Table 1). Streamflow and sediment discharge are reported in metric units and natural logarithms are used throughout the example.
Table 1: Instantaneous values of streamflow (Q) and sediment discharge obtained from a sampling program. ---------------------------------------------------------------------- Obser- Streamflow Sediment Natural lnQ - mean lnQ vation (Q) Discharge Log (m3/s) (metric tons) of (Q) ---------------------------------------------------------------------- 1 125.44 12698.0 4.83183 1.09279 2 70.0 1287.9 4.24850 0.50946 3 68.86 2575.9 4.23208 0.49304 4 27.86 197.7 3.32719 -0.41185 5 41.44 1623.5 3.72425 -0.01479 6 13.38 25.4 2.59376 -1.14528 7 24.92 87.1 3.21567 -0.52337 ---------------------------------------------------------------------- = 3.7390 =3.45225= QBAR = QVAR ----------------------------------------------------------------------
A log-log regression between the instantaneous values of sediment discharge and streamflow is applied to the data in Table 1 and is shown in Figure 1. A considerable amount of analysis has convinced us that this is the correct model and the equation for the above data is:
(1)
where: = instantaneous sediment discharge or load estimated
from the transport (rating) curve in metric tons per day
Q = instantaneous streamflow discharge in m3/sec
0.020 = the exponentiated intercept (-3.901).
The mean square error of the regression (s2) is 0.3465.
Figure 1. -- The relation between sediment discharge and streamflow for the example.
From the daily streamflow record at this gaging station we want to compute sediment loads for a 14 day period. Although there are several ways to accomplish this task, we will use equation 1 to compute a daily sediment load and sum these daily loads for the 14 day period. In doing this we have assumed the relation between instantaneous and daily values are equivalent. This is not always the case. Table 2 shows the 14 day series of daily streamflow and the predicted daily sediment discharge computed from eq. 1.
Table 2: Daily values of streamflow and predicted daily sediment discharge for a 14 day study period. --------------------------------------------- Daily mean Daily Sediment Discharge Discharge Obs Month Day (m3/sec) (metric tons/ day) --------------------------------------------- 1 1 21 13.44 26.03 2 1 22 17.64 55.13 3 1 23 40.60 550.25 4 1 24 26.88 176.30 5 1 25 22.18 103.73 6 1 26 21.59 96.29 7 1 27 26.88 176.30 8 1 28 43.68 673.30 9 1 29 60.20 1631.99 10 1 30 37.52 442.58 11 1 31 27.86 194.62 12 2 1 25.90 159.12 13 2 2 24.92 143.05 14 2 3 24.92 143.05 --------------------------------------------- Total --- --- --- 4571.74 ---------------------------------------------
(2)
where: = estimated sediment discharge (load) using the quasi-maximum likelihood estimator (QMLE)
LRC = Sediment load estimated from the rating curve
s2 = mean square error of the regression
Therefore, in the example the total load computed from the regression (LRC) for the 14 day period is 4,571.74 metric tons and the mean square error of the regression (s2) is 0.3465. Inserting into equation 2 yields:
(3)
= 5436.6 metric tons
Although used extensively, this method is generally not recommended because the results are not unbiased estimates and the method often over-corrects for the downward bias of the rating curve (Cohn and Gilroy, 1991, p.48). Methods #2 and #3 described below are suggested for use by Cohn and Gilroy, 1991. This QMLE method may be suitable if the predicted discharges are within the interval of a fairly large calibration data set and the sample mean square error (s2) is a satisfactory estimator for the population mean square error.
where:
= estimated sediment discharge (load) using the smearing estimator
ei = residuals from least squares regression from Eq. 1 and are the differences in the natural logarithms of measured and computed sediment discharge.
In the example there are seven observations in the regression equation (Eq. 1). The residuals are shown in Table 3.
Table 3: Residuals from the example problem regression equation. --------------------------------------------------------- Natural Log of ------Sediment Discharge---------|-----Residual---- Obs Observed Predicted ei exp(ei) --------------------------------------------------------- 1 9.4492 9.4238 0.0254 1.0257 2 7.1608 7.8138 -0.6531 0.5205 3 7.8540 7.7685 0.0854 1.0892 4 5.2868 5.2710 0.0157 1.0158 5 7.3923 6.3669 1.0254 2.7883 6 3.2347 3.2468 -0.0120 0.9881 7 4.4671 4.9632 -0.4961 0.6089 Total --- --- --- 8.0365 ---------------------------------------------------------______________________________________________________________________
Equation 4 is:
(5)
= 5248.7 metric tons
This is a nonparametric method that only requires the assumption that the residuals are independent and identically distributed (homoscedastic) but they can follow any distribution. When residuals are normally distributed it performs nearly as well as MVUE (example no. 3). Also, this estimator can be generalized to transformations other than the logarithm. For example, if the transformation function is the square root or inverse then the function used is the inverse of the selected transformation -- the square or inverse.
(6)
and:
where: = estimated sediment discharge (load) for the day using the minimum variance unbiased estimator
LRC(t) = Sediment load estimated from the transport (rating) curve for each day (t)
gm = a function introduced by Finney (1951) and used by Bradu and Mundlak (1970). This function can be evaluated by the computer program (Cohn and others, 1989, p. 939) in this Appendix.
m = degrees of freedom of the regression equation
V = an estimate of the variability at a given value of
streamflow discharge and computed by eq. 7
Computation of V is:
(7)
where: Q* = daily mean streamflow for the day loads are being predicted
Q = instantaneous streamflow used in the regression (logarithms are shown in Table 1)
N = number of data points in the regression
The minimum variance unbiased estimator (MVUE) can be evaluated using the Fortran program attached to the end of this example. This computer program evaluates Equation 6 and 7 along with the expression gm. The computer program requires an input file called MVUE.IN containing a single column of streamflow discharges (column 4 of Table 2) for which the predicted daily sediment discharge needs to be corrected for bias. The output for the program is a file, MVUE.OUT, which contains the bias corrected daily sediment discharges and the total sediment discharge for the period. Output for the example program is shown in Table 4. The program requires input variables including XN, QBAR, QVAR, XM, S2, A, and B. These variables are defined in the program and values from the example problem for these variables are included in the program as default values. Different values can be added from the terminal and the default values can be changed by editing the program and re-compiling.
Table 4: Output for MVUE from computer program ------------------------------------------------------ Daily Q V Argument* gm LMVUE Obs m3/s (eq.7) of gm() (eq.6) ------------------------------------------------------ 1 13.44 0.5198 0.0998 1.0857 28. 2 17.64 0.3615 0.1327 1.1151 61. 3 40.60 0.1432 0.1781 1.1565 636. 4 26.88 0.2009 0.1661 1.1455 202. 5 22.18 0.2614 0.1535 1.1339 118. 6 21.59 0.2716 0.1514 1.1320 109. 7 26.88 0.2009 0.1661 1.1455 202. 8 43.68 0.1433 0.1781 1.1565 779. 9 60.20 0.1801 0.1705 1.1494 1876. 10 37.52 0.1466 0.1774 1.1559 512. 11 27.86 0.1920 0.1680 1.1472 223. 12 25.90 0.2109 0.1640 1.1436 182. 13 24.92 0.2222 0.1617 1.1414 163. 14 24.92 0.2222 0.1617 1.1414 163. ------------------------------------------------------ Total --- --- --- --- 5254. metric tons ------------------------------------------------------
* the expression from eq. (6)
______________________________________________________________________
This is a parametric method that provides an unbiased estimate. Note that the calculation for bias correction are done on each daily value. Therefore, this method is particularly helpful if there is a large variance in daily discharges, if the prediction interval changes greatly over the range of data, or if many of the predictions are made near the extremes of the relation.
Although the results can vary widely among data sets, it is interesting to compare the computed sediment discharges from the various methods for the example problem. Note that the sediment discharge computed with no bias correction is 13 percent smaller than the estimate using MVUE (method 3). The QMLE (method 1) estimate is 4 percent larger than the MVUE estimate. The MVUE and the Smearing estimate (method 2) differ by 0.6 percent. The direction of these differences appear typical; however, the magnitude of the differences could vary widely.
Fortran Program Used to Compute Equations 6 and 7.
C This program computes the bias corrected load using MVUE C INTEGER I REAL A,AA,ARG,B,QBAR,QSAVE,QSTAR,QVAR,S2,TEMP,UE,V,XM,XN WRITE(*,*)'This program reads the daily discharges from a file #called MVUE.IN' WRITE(*,*)'and writes the bias corrected loads to a file called #MVUE.OUT.' WRITE(*,*)' The discharges in MVUE.IN can be in any format # but must be in the' WRITE(*,*)'first column.' WRITE(*,*)' ' WRITE(*,*)'Enter a 1 to continue or any letter to quit. ' READ(*,'(I5)',ERR=99)I IF(I.NE.1)GO TO 99 OPEN (15,FILE='MVUE.IN',STATUS='OLD') OPEN (16,FILE='MVUE.OUT') C INPUT VARIABLES C XN = Number of data pairs C QBAR = Mean of ln Q C QVAR = Sum of ((LN Q - LN Q MEAN)**2) C XM = Number of degrees of freedom of residuals C S2 = Mean square error C A = Exponentiated intercept of sediment load - discharge C relation C B = Slope of sediment load - discharge relation XN = 7. QBAR = 3.7390 QVAR = 3.45225 XM = 5. S2 = 0.3465 A = 0.020 B = 2.76 C Enter updated data WRITE(16,*)'Output of the MVUE program' WRITE(*,*)'Default (Number of data pairs) XN = ',XN WRITE(*,*)'Enter n to select default or enter new value- ' READ(*,'(F9.0)',ERR=10)AA XN=AA 10 WRITE(16,*)' XN (Number of data pairs) = ',XN WRITE(*,*)'Default (Mean of ln Q) QBAR = ',QBAR WRITE(*,*)'Enter n to select default or enter new value- ' READ(*,'(F9.0)',ERR=11)AA QBAR=AA 11 WRITE(16,*)' QBAR (Mean of ln Q) = ',QBAR WRITE(*,*)'Default [Sum of ((LN Q - LN Q MEAN)**2)] QVAR # = `, QVAR WRITE(*,*)'Enter n to select default or enter new value- ' READ(*,'(F9.0)',ERR=12)AA QVAR=AA 12 WRITE(16,*)' QVAR [Sum of ((LN Q - LN Q MEAN)**2)] = ',QVAR WRITE(*,*)'Default (No of degrees of freedom) XM = ',XM WRITE(*,*)'Enter n to select default or enter new value- ' READ(*,'(F9.0)',ERR=13)AA XM=AA 13 WRITE(16,*)' XM (No of degrees of freedom) = ',XM WRITE(*,*)'Default (Mean square error) S2 = ',S2 WRITE(*,*)'Enter n to select default or enter new value- ' READ(*,'(F9.0)',ERR=14)AA S2=AA 14 WRITE(16,*)' S2 (Mean square error) = ',S2 WRITE(*,*)'Default (Intercept of Qs-Q relation) A = ',A WRITE(*,*)'Enter n to select default or enter new value- ' READ(*,'(F9.0)',ERR=15)AA A=AA 15 WRITE(16,*)' A (Intercept of Qs-Q relation) = ',A WRITE(*,*)'Default (Slope of Qs-Q relation) B = ',B WRITE(*,*)'Enter n to select default or enter new value- ' READ(*,'(F9.0)',ERR=16)AA B=AA 16 WRITE(16,*)' B (Slope of Qs-Q relation) = ',B QAVE=0.0 C ********************* C READ QSTAR (DISCHARGES AT WHICH LOADS ARE PREDICTED) WRITE(16,*)' ' WRITE(16,*)' Obs Daily V eq 5 Arguement gm # L mvue' WRITE(16,*)' ' I=0 20 READ(15,*,END=30) QSTAR V=(1./XN + (((LOG(QSTAR)-QBAR)**2)/QVAR)) ARG=((XM+1)/(2.*XM))*((1.- V)*S2) TEMP = GM(XM,ARG) UE=(A*(QSTAR**B))*TEMP I=I+1 QAVE=QAVE+UE WRITE(16,901) I,QSTAR,V,ARG,TEMP,UE 901 FORMAT(I5,G12.4,3G12.5,F12.0) GO TO 20 30 CONTINUE WRITE(16,900)QAVE 900 FORMAT(43X,'Total',3X,F14.0) 99 CONTINUE END FUNCTION GM(XM,ARG) C ******** C FUNCTION TO COMPUTE FINNEY'S GM(T) C C AUTHOR.........TIM COHN C DATE...........OCTOBER 1, 1986 C C XM R*4 NUMBER OF DEGREES OF FREEDOM OF RESIDUALS C ARG R*4 ARGUMENT TO FINNEY'S FUNCTION C DATA TOL/1.E-7/ IF (ABS(ARG) .GT. 50.0) THEN PRINT *,' MAGNITUDE OF ARG IS TOO LARGE (GM)' GM = 0.0 RETURN ENDIF GM = 1.0 IF(XM .LE. 0.0) RETURN BT = ARG*XM**2/(2.0*(XM+1.0)) TERM = 1.0 DO 10 P=1,1000 TERM = TERM * BT/((XM/2.0+P-1.0)*P) GM = GM+TERM IF(P .GT. 1.0 .AND. ABS(TERM) .LT. TOL) RETURN 10 CONTINUE PRINT *, 'GM DID NOT CONVERGE' RETURN END