在循环中应用lm后出错

时间:2016-05-16 15:31:30

标签: r regression

我正在尝试应用以下代码,它适用于没有NA值的任何数据。但是,当我包含NA值的数据时,我收到以下消息:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases

我使用的代码是:

    m <- data.frame(matrix(ncol = 5, nrow = length(unique(df$Year))*length(unique(df$Firm))))
    enter code here
l = 0
for(i in unique(df$Year)) {
  for(j in unique(df$Firm)) {
    l = l + 1
    mod<-lm(Ri ~ RM + Rz, data = df, subset = df$Year==i & df$Firm ==j)
    m[l,] <- c(i,
               as.character(j), 
               mod$coefficients[2],
               mod$coefficients[3],
               summary(mod)$sigma)
  }
}
names(m) <- c("Year", "Firm", "B1", "B2","e")

这是我正在使用的数据的示例:

Year   Firm    Ri    Rm    Rz
2009   A       30    55    NA
2009   A       0     55    NA
2009   A       1     55    NA
2010   A       7     55    85
2010   A       15    NA    85
2011   A       0     55    85
2011   A       3.5   55    85
2011   A       8     NA    85
2009   B       24    55    85
2009   B       30    55    85
2009   B       25    55    85
2010   B       5.2   NA    85
2010   B       11.8  55    85
2011   B       0     55    NA
2011   B       90    55    NA
2011   B       57    55    NA

任何建议???

1 个答案:

答案 0 :(得分:4)

除了上面的数据问题,您可以使用dplyrbroom个软件包的组合重写您的代码:

library(dplyr)
library(tidyr)
df$Rz <- 85 # Imput values of Rz to make the code work
df %>% group_by(Year, Firm) %>% do(tidy(lm(Ri ~ Rm + Rz, data = .)))

Source: local data frame [6 x 7]
Groups: Year, Firm [6]

   Year   Firm        term estimate std.error statistic     p.value
  <int> <fctr>       <chr>    <dbl>     <dbl>     <dbl>       <dbl>
1  2009      A (Intercept) 10.33333  9.837570  1.050395 0.403735888
2  2009      B (Intercept) 26.33333  1.855921 14.188819 0.004930448
3  2010      A (Intercept)  7.00000       NaN       NaN         NaN
4  2010      B (Intercept) 11.80000       NaN       NaN         NaN
5  2011      A (Intercept)  1.75000  1.750000  1.000000 0.500000000
6  2011      B (Intercept) 49.00000 26.286879  1.864048 0.203331016

更新:添加过滤器选项,以便使用lm适合其他一个(自变量)中没有所有NA的Year / Firm组:

df %>% group_by(Year, Firm) %>% filter(!all(is.na(Rm)) & !all(is.na(Rz))) %>% do(tidy(lm(Ri ~ Rm + Rz, data = .)))
Source: local data frame [4 x 7]
Groups: Year, Firm [4]

   Year   Firm        term estimate std.error statistic     p.value
  <int> <fctr>       <chr>    <dbl>     <dbl>     <dbl>       <dbl>
1  2009      B (Intercept) 26.33333  1.855921  14.18882 0.004930448
2  2010      A (Intercept)  7.00000       NaN       NaN         NaN
3  2010      B (Intercept) 11.80000       NaN       NaN         NaN
4  2011      A (Intercept)  1.75000  1.750000   1.00000 0.500000000

此输出仅显示截距模型拟合,因为所提供的样本数据中没有其他可变性。但是,如果您有这样的可变性(例如在mtcars数据集上),您将得到如下输出:

mtcars %>% group_by(cyl) %>% do(tidy(lm(mpg ~ wt + am, data = mtcars)))
Source: local data frame [9 x 6]
Groups: cyl [3]

    cyl        term    estimate std.error   statistic      p.value
  <dbl>       <chr>       <dbl>     <dbl>       <dbl>        <dbl>
1     4 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
2     4          wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
3     4          am -0.02361522 1.5456453 -0.01527855 9.879146e-01
4     6 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
5     6          wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
6     6          am -0.02361522 1.5456453 -0.01527855 9.879146e-01
7     8 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
8     8          wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
9     8          am -0.02361522 1.5456453 -0.01527855 9.879146e-01

编辑:添加一个简单的例子来证明原帖中的问题:

x <- 1:10
y <- 1:10
z <- NA
df <- data.frame(x = x, y = y, z = z)
lm(x ~ y + z, data = df)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases