如何在稳健回归中从因子变量中排除一个级别?

时间:2014-06-08 03:54:59

标签: r

从下面的两个回归可以看出,由于数据矩阵的奇异性,lm()rlm()崩溃时给出解决方案。 lm()内部会删除一个因子级别以避免奇点,但rlm()不会。

简单线性回归案例:

  result.lm <- lm(log(export + import) ~ log(gdp.i*gdp.j) + 
        log(dis) + log(Sij) + AFC + GFC + I(dpgdp*0.001)+ 
        factor(id),
        data = mydata)


Coefficients: (1 not defined because of singularities)
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)        -130.56396   34.10023  -3.829 0.000177 ***
log(gdp.i * gdp.j)    1.73176    0.20873   8.297 2.39e-14 ***
log(dis)              6.89208    4.75270   1.450 0.148750    
log(Sij)             -5.18435    1.80221  -2.877 0.004502 ** 
AFC                  -1.00819    0.86188  -1.170 0.243640    
GFC                   0.49326    0.58950   0.837 0.403834    
I(dpgdp * 0.001)     -0.05713    0.03733  -1.530 0.127701    
factor(id)IDN_PHL    -7.02467    4.46062  -1.575 0.117044    
factor(id)IDN_SGP     4.10315    1.42839   2.873 0.004558 ** 
factor(id)IDN_THA    -3.37530    3.44619  -0.979 0.328675    
factor(id)IDN_VNM   -11.75983    5.24573  -2.242 0.026189 *  
factor(id)MYS_SGP    12.16543    6.13940   1.982 0.049045 *  
factor(id)MYS_THA     2.75659    0.72603   3.797 0.000200 ***
factor(id)MYS_VNM    -5.31554    3.01239  -1.765 0.079325 .  
factor(id)PHL_MYS    -3.74970    3.82106  -0.981 0.327742    
factor(id)PHL_SGP    -3.72441    3.84997  -0.967 0.334642    
factor(id)PHL_THA    -2.32179    3.26691  -0.711 0.478187    
factor(id)PHL_VNM    -2.43611    2.32941  -1.046 0.297045    
factor(id)SGP_THA     4.18854    1.68147   2.491 0.013639 *  
factor(id)SGP_VNM    -0.54607    3.62445  -0.151 0.880409    
factor(id)THA_VNM          NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.982 on 181 degrees of freedom
Multiple R-squared:  0.5808,    Adjusted R-squared:  0.5368 
F-statistic:  13.2 on 19 and 181 DF,  p-value: < 2.2e-16

强大的回归案例:

为了避免来自MASS包的rlm()中的奇点,我使用了:

library(car); library(MASS)
  result.rlm <- rlm(log(export + import) ~ log(gdp.i*gdp.j) + 
        log(dis) + log(Sij) + AFC + GFC + I(dpgdp*0.001)+ 
        factor(id, exclude == "THA_VNM"),
        data = mydata)

返回错误消息:

Error in rlm.default(x, y, weights, method = method, wt.method = wt.method,  : 
  'x' is singular: singular fits are not implemented in 'rlm'
In addition: Warning message:
In as.vector(exclude, typeof(x)) : NAs introduced by coercion

如何从factor ID中排除一个级别以从rlm()函数中获取结果?

可用于复制问题的mydata部分如下:

 id export  import  gdp.i   gdp.j   dis Sij AFC GFC dpgdp
 PHL_MYS    21090   54082   1.03E+11    1.44E+11    2470.863    0.243267763 0   0   4352.999196
 IDN_MYS    1273344 6350191 2.86E+11    1.44E+11    1174.196    0.222531092 0   0   4280.470783
 IDN_PHL    1352286 6501568 2.86E+11    1.03E+11    2792.088    0.194772855 0   0   72.528413
 MYS_SGP    11849639    3100352 1.44E+11    1.24E+11    315.5433    0.248594031 0   0   23398.89647
 PHL_SGP    1010140 3406594 1.03E+11    1.24E+11    2396.775    0.247965171 0   0   27751.89567
 IDN_SGP    62247342    2374634 2.86E+11    1.24E+11    886.1407    0.210675425 0   0   27679.36726
 PHL_THA    126863901   131288917   1.03E+11    1.76E+11    2210.015    0.232802184 0   0   1489.015435
 IDN_THA    174813908   406988998   2.86E+11    1.76E+11    2316.466    0.23596528  0   0   1416.487022
 SGP_THA    131102650   238482275   1.24E+11    1.76E+11    1433.936    0.242235497 0   0   26262.88023
 MYS_THA    45626339    92914926    1.44E+11    1.76E+11    1187.123    0.247368503 0   0   2863.983761
 IDN_VNM    14635829    1705791 2.86E+11    57633255739 3023.314    0.139630737 0   0   573.9790196
 MYS_VNM    1140384 19607   1.44E+11    57633255739 2040.94 0.204415889 0   0   4854.449802
 SGP_VNM    5413912 5137507 1.24E+11    57633255739 2207.195    0.216937571 0   0   28253.34628
 THA_VNM    10375   316 1.76E+11    57633255739 990.7018    0.185642137 0   0   1990.466041
 IDN_MYS    61500692    3164431 3.65E+11    1.63E+11    1174.196    0.213350529 0   0   4578.607187
 IDN_SGP    12985625    23866106    3.65E+11    1.39E+11    886.1407    0.199850336 0   0   29984.59857
 PHL_SGP    18400   116669  1.22E+11    1.39E+11    2396.775    0.248964804 0   0   30186.80165
 MYS_SGP    14410298    2247747 1.63E+11    1.39E+11    315.5433    0.248461189 0   0   25405.99138
 MYS_THA    68755379    26223833    1.63E+11    2.07E+11    1187.123    0.246396222 0   0   3036.402002
 IDN_THA    49410654    138502983   3.65E+11    2.07E+11    2316.466    0.231027427 0   0   1542.205185
 PHL_THA    72197200    166850064   1.22E+11    2.07E+11    2210.015    0.233390872 0   0   1744.408267
 SGP_THA    132220146   277333084   1.39E+11    2.07E+11    1433.936    0.240330641 0   0   28442.39339
 PHL_VNM    40525   3475176 1.22E+11    66371664817 1750.016    0.228081191 0   0   602.1758296
 SGP_VNM    9686544 6916182 1.39E+11    66371664817 2207.195    0.218722399 0   0   30788.97748
 MYS_VNM    118597  107725  1.63E+11    66371664817 2040.94 0.205795797 0   0   5382.986099
 THA_VNM    2925753 11249569    2.07E+11    66371664817 990.7018    0.183801906 0   0   2346.584097
 IDN_MYS    88079132    24559821    4.32E+11    1.94E+11    1174.196    0.213634936 0   0   5347.115267
 IDN_PHL    25877152    6138473 4.32E+11    1.49E+11    2792.088    0.190862982 0   0   190.7373159
 MYS_SGP    25406889    2592050 1.94E+11    1.69E+11    315.5433    0.248823886 0   0   29547.92915
 IDN_SGP    104020998   4359943 4.32E+11    1.69E+11    886.1407    0.201927152 0   0   34895.04442
 PHL_THA    51290535    259950903   1.49E+11    2.47E+11    2210.015    0.234834327 0   0   2057.166994
 IDN_THA    15456842    82233669    4.32E+11    2.47E+11    2316.466    0.2314039   0   0   1866.429678
 MYS_THA    25580025    405724623   1.94E+11    2.47E+11    1187.123    0.246323269 0   0   3480.685589
 SGP_THA    109397804   181203225   1.69E+11    2.47E+11    1433.936    0.241136255 0   0   33028.61474
 IDN_VNM    116169  411089  4.32E+11    77414425532 3023.314    0.128828319 0   0   952.108653
 MYS_VNM    78770   5099    1.94E+11    77414425532 2040.94 0.204073979 0   0   6299.22392
 PHL_VNM    12322   442466  1.49E+11    77414425532 1750.016    0.224837139 0   0   761.3713371
 SGP_VNM    12167407    6959737 1.69E+11    77414425532 2207.195    0.215604147 0   0   35847.15307
 THA_VNM    192568  15221723    2.47E+11    77414425532 990.7018    0.181693618 0   0   2818.538331
 IDN_MYS    195446734   38077097    5.10E+11    2.31E+11    1174.196    0.214515503 0   0   6282.103658
 IDN_PHL    2221    1074    5.10E+11    1.74E+11    2792.088    0.18941607  0   0   257.2702704
 IDN_SGP    137780587   131012335   5.10E+11    1.79E+11    886.1407    0.192218812 0   0   34794.08436
 MYS_SGP    29608269    1785983 2.31E+11    1.79E+11    315.5433    0.245966948 0   0   28511.9807
 IDN_THA    83960790    638313022   5.10E+11    2.73E+11    2316.466    0.226956384 0   0   1940.13688
 PHL_THA    93904304    489639916   1.74E+11    2.73E+11    2210.015    0.237698194 0   0   2197.40715
 MYS_THA    27635575    463572856   2.31E+11    2.73E+11    1187.123    0.248294683 0   0   4341.966779
 SGP_THA    91150086    272714486   1.79E+11    2.73E+11    1433.936    0.239243442 0   0   32853.94748
 SGP_VNM    32692201    8777399 1.79E+11    99130304099 2207.195    0.229411821 0   0   35807.78867
 PHL_VNM    1183981 452291  1.74E+11    99130304099 1750.016    0.231359489 0   0   756.4340478
 MYS_VNM    339799  1114755 2.31E+11    99130304099 2040.94 0.210114801 0   0   7295.807976
 THA_VNM    278151  32005   2.73E+11    99130304099 990.7018    0.195565711 0   0   2953.841198
 IDN_VNM    40753   568034  5.10E+11    99130304099 3023.314    0.13621204  0   0   1013.704318

1 个答案:

答案 0 :(得分:0)

让我们以稍微迂回的方式解决问题,首先根据id创建一个虚拟矩阵,然后运行rlm函数,排除某些级别对应于某些级别的某些列。

# create dummy matrix for id

idx <- sort(unique(mydata$id))  
dummy <- matrix(NA, nrow = nrow(mydata), ncol = length(idx))

for (j in 1:length(idx)){
dummy[,j] <- as.integer(mydata$id == idx[j])
            }
dummy <- data.frame(dummy)
names(dummy) <- idx
mydata <- cbind(mydata, dummy)

# run rlm excluding some levels (e.g. levels 14 and 15) of id

model <- as.formula(paste("log(export + import) ~ log(gdp.i*gdp.j) + 
            log(dis)+ log(Sij) + AFC + GFC + I(dpgdp*0.001)", 
            paste(unique(mydata$id)[1:13], collapse = " + "),sep="+"))

result.rlm <- rlm(model, data = mydata)

summary(result.rlm)

Call: rlm(formula = model, data = mydata)
Residuals:
    Min       1Q   Median       3Q      Max 
-9.17356 -1.21478  0.07208  1.25235  5.18840 

Coefficients:
                   Value     Std. Error t value  
(Intercept)        -333.2742   48.8993    -6.8155
log(gdp.i * gdp.j)    1.2882    0.0745    17.2959
log(dis)             37.6566    6.1469     6.1262
log(Sij)              1.4847    0.6695     2.2174
AFC                   0.4229    0.2791     1.5152
GFC                  -0.0674    0.2331    -0.2892
I(dpgdp * 0.001)     -0.0819    0.0137    -5.9591
IDN_MYS              20.1923    3.4593     5.8371
IDN_PHL             -13.8168    1.9475    -7.0948
IDN_SGP              34.1516    5.4185     6.3028
MYS_SGP              72.8390   11.7699     6.1886
IDN_THA              -5.9810    0.8392    -7.1271
MYS_THA              20.6914    3.3979     6.0894
SGP_THA              15.8244    2.4896     6.3563
IDN_VNM             -16.0143    2.4548    -6.5236
THA_VNM              28.0544    4.5140     6.2150
PHL_MYS              -7.0767    1.1393    -6.2115
PHL_SGP              -3.3307    0.7094    -4.6954
PHL_THA              -2.5637    0.5553    -4.6163
PHL_VNM               4.8306    1.0629     4.5445

Residual standard error: 1.827 on 1455 degrees of freedom