我正在尝试应用以下代码,它适用于没有NA值的任何数据。但是,当我包含NA值的数据时,我收到以下消息:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
我使用的代码是:
m <- data.frame(matrix(ncol = 5, nrow = length(unique(df$Year))*length(unique(df$Firm))))
enter code here
l = 0
for(i in unique(df$Year)) {
for(j in unique(df$Firm)) {
l = l + 1
mod<-lm(Ri ~ RM + Rz, data = df, subset = df$Year==i & df$Firm ==j)
m[l,] <- c(i,
as.character(j),
mod$coefficients[2],
mod$coefficients[3],
summary(mod)$sigma)
}
}
names(m) <- c("Year", "Firm", "B1", "B2","e")
这是我正在使用的数据的示例:
Year Firm Ri Rm Rz
2009 A 30 55 NA
2009 A 0 55 NA
2009 A 1 55 NA
2010 A 7 55 85
2010 A 15 NA 85
2011 A 0 55 85
2011 A 3.5 55 85
2011 A 8 NA 85
2009 B 24 55 85
2009 B 30 55 85
2009 B 25 55 85
2010 B 5.2 NA 85
2010 B 11.8 55 85
2011 B 0 55 NA
2011 B 90 55 NA
2011 B 57 55 NA
任何建议???
答案 0 :(得分:4)
除了上面的数据问题,您可以使用dplyr
和broom
个软件包的组合重写您的代码:
library(dplyr)
library(tidyr)
df$Rz <- 85 # Imput values of Rz to make the code work
df %>% group_by(Year, Firm) %>% do(tidy(lm(Ri ~ Rm + Rz, data = .)))
Source: local data frame [6 x 7]
Groups: Year, Firm [6]
Year Firm term estimate std.error statistic p.value
<int> <fctr> <chr> <dbl> <dbl> <dbl> <dbl>
1 2009 A (Intercept) 10.33333 9.837570 1.050395 0.403735888
2 2009 B (Intercept) 26.33333 1.855921 14.188819 0.004930448
3 2010 A (Intercept) 7.00000 NaN NaN NaN
4 2010 B (Intercept) 11.80000 NaN NaN NaN
5 2011 A (Intercept) 1.75000 1.750000 1.000000 0.500000000
6 2011 B (Intercept) 49.00000 26.286879 1.864048 0.203331016
更新:添加过滤器选项,以便使用lm
适合其他一个(自变量)中没有所有NA的Year / Firm组:
df %>% group_by(Year, Firm) %>% filter(!all(is.na(Rm)) & !all(is.na(Rz))) %>% do(tidy(lm(Ri ~ Rm + Rz, data = .)))
Source: local data frame [4 x 7]
Groups: Year, Firm [4]
Year Firm term estimate std.error statistic p.value
<int> <fctr> <chr> <dbl> <dbl> <dbl> <dbl>
1 2009 B (Intercept) 26.33333 1.855921 14.18882 0.004930448
2 2010 A (Intercept) 7.00000 NaN NaN NaN
3 2010 B (Intercept) 11.80000 NaN NaN NaN
4 2011 A (Intercept) 1.75000 1.750000 1.00000 0.500000000
此输出仅显示截距模型拟合,因为所提供的样本数据中没有其他可变性。但是,如果您有这样的可变性(例如在mtcars
数据集上),您将得到如下输出:
mtcars %>% group_by(cyl) %>% do(tidy(lm(mpg ~ wt + am, data = mtcars)))
Source: local data frame [9 x 6]
Groups: cyl [3]
cyl term estimate std.error statistic p.value
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 4 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
2 4 wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
3 4 am -0.02361522 1.5456453 -0.01527855 9.879146e-01
4 6 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
5 6 wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
6 6 am -0.02361522 1.5456453 -0.01527855 9.879146e-01
7 8 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
8 8 wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
9 8 am -0.02361522 1.5456453 -0.01527855 9.879146e-01
编辑:添加一个简单的例子来证明原帖中的问题:
x <- 1:10
y <- 1:10
z <- NA
df <- data.frame(x = x, y = y, z = z)
lm(x ~ y + z, data = df)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases