17.2 Censoring

In the case of censoring, the values of the dependent variable are reported at a certain point if they are above or below a certain value.

If all data was reported at the correct value, the following following regression model could be executed:

summary(bhat_real)
## 
## Call:
## lm(formula = yreal ~ x, data = censoring)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2959 -0.7400 -0.3396  0.5394  2.4274 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.09074    0.31283  -6.683 2.25e-08 ***
## x            0.50987    0.05614   9.082 5.38e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.139 on 48 degrees of freedom
## Multiple R-squared:  0.6321, Adjusted R-squared:  0.6245 
## F-statistic: 82.48 on 1 and 48 DF,  p-value: 5.375e-12

Ignoring censoring leads to biased results:

summary(bhat_censored)
## 
## Call:
## lm(formula = y ~ x, data = censoring)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.74069 -0.52897 -0.07453  0.45831  2.07293 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.5346     0.2413  -2.215   0.0315 *  
## x             0.3090     0.0433   7.136 4.55e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8782 on 48 degrees of freedom
## Multiple R-squared:  0.5148, Adjusted R-squared:  0.5047 
## F-statistic: 50.92 on 1 and 48 DF,  p-value: 4.552e-09

Using the R package censReg) allows for the reduction of the bias:

b_correct = censReg(y~x,data=censoring)
summary(b_correct)
## 
## Call:
## censReg(formula = y ~ x, data = censoring)
## 
## Observations:
##          Total  Left-censored     Uncensored Right-censored 
##             50             23             27              0 
## 
## Coefficients:
##             Estimate Std. error t value  Pr(> t)    
## (Intercept) -2.22754    0.53483  -4.165 3.11e-05 ***
## x            0.52449    0.08179   6.412 1.43e-10 ***
## logSigma     0.20741    0.14219   1.459    0.145    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Newton-Raphson maximisation, 6 iterations
## Return code 1: gradient close to zero (gradtol)
## Log-likelihood: -54.35225 on 3 Df