Question

关于如何使用plyr包中的ldply处理不同长度的输出的简单问题。这是我正在使用的代码的简单版本以及我得到的错误：

# function to collect the coefficients from the regression models:
> SecreatWeapon <- dlply(merged1,~country.x, function(df) {
+     lm(log(child_mortality) ~ log(IHME_usd_gdppc)+ hiv_prev,data=df)
+ })
> 
# functions to extract the output of interest
> extract.coefs <- function(mod) c(extract.coefs = summary(mod)$coefficients[,1])
> extract.se.coefs <- function(mod) c(extract.se.coefs = summary(mod)$coefficients[,2])
> 
# function to combine the extracted output
> res <- ldply(SecreatWeapon, extract.coefs)
Error in list_to_dataframe(res, attr(.data, "split_labels")) : 
 Results do not have equal lengths

这里的错误是由于某些模型将包含NA值，因此：

> SecreatWeapon[[1]]

Call:
lm(formula = log(child_mortality) ~ log(IHME_usd_gdppc) + hiv_prev, 
    data = df)

Coefficients:
       (Intercept)  log(IHME_usd_gdppc)             hiv_prev  
           -4.6811               0.5195                   NA

因此以下输出的长度不同;例如：

> summary(SecreatWeapon[[1]])$coefficients
                  Estimate Std. Error   t value     Pr(>|t|)
(Intercept)         -4.6811000  0.6954918 -6.730633 6.494799e-08
log(IHME_usd_gdppc)  0.5194643  0.1224292  4.242977 1.417349e-04

但对于另一个我得到了

> summary(SecreatWeapon[[10]])$coefficients
                   Estimate  Std. Error    t value     Pr(>|t|)
(Intercept)           18.612698   1.7505236  10.632646 1.176347e-12
log(IHME_usd_gdppc)   -2.256465   0.1773498 -12.723244 6.919009e-15
hiv_prev            -272.558951 160.3704493  -1.699558 9.784053e-02

任何简单的修复方法？非常感谢，

Antonio Pedro。

Answer 1

使用summary.lm( . )访问的$coefficients函数为具有NA“系数”的任何lm-object提供的coef与lm参数的输出不同。你会对使用这样的东西感到满意：

coef.se <- function(mod) {
      extract.coefs <- function(mod) coef(mod) # lengths all the same
      extract.se.coefs <- function(mod) { summary(mod)$coefficients[,2]}
return( merge( extract.coefs(mod), extract.se.coefs(mod), by='row.names', all=TRUE) ) 
             }

根据罗兰的例子，它给出了：

> coef.se(fit)
    Row.names          x         y
1 (Intercept) -0.3606557 0.1602034
2          x1  2.2131148 0.1419714
3          x2         NA        NA

您可以将x重命名为coef，将y重命名为se.coef

Answer 2

y <- c(1,2,3)
x1 <- c(0.6,1.1,1.5)
x2 <- c(1,1,1)
fit <- lm(y~x1+x2)

summary(fit)$coef
#              Estimate Std. Error   t value   Pr(>|t|)
#(Intercept) -0.3606557  0.1602034 -2.251236 0.26612016
#x1           2.2131148  0.1419714 15.588457 0.04078329

#function for full matrix, adjusted from getAnywhere(print.summary.lm)
full_coeffs <- function (fit) {
     fit_sum <- summary(fit)    
     cn <- names(fit_sum$aliased)
     coefs <- matrix(NA, length(fit_sum$aliased), 4, 
                     dimnames = list(cn, colnames(fit_sum$coefficients)))
     coefs[!fit_sum$aliased, ] <- fit_sum$coefficients
     coefs
}

full_coeffs(fit)
#              Estimate Std. Error   t value   Pr(>|t|)
#(Intercept) -0.3606557  0.1602034 -2.251236 0.26612016
#x1           2.2131148  0.1419714 15.588457 0.04078329
#x2                  NA         NA        NA         NA

使用ldply处理不同长度的输出

2 个答案: