根据以下链接,我创建了一个代码,用于根据变量对数据的子集进行回归。
Loop linear regression and saving coefficients
在这个例子中,我创建了一个DUMMY(0或1)来创建子集(实际上我有3000个子集)
res <- do.call(rbind, lapply(split(mydata, mydata$DUMMY),function(x){
fit <- lm(y~x1 + x2, data=x)
res <- data.frame(DUMMY=unique(x$DUMMY), coeff=coef(fit))
res
}))
这会产生以下数据集
DUMMY coeff
0.(Intercept) 0 22.8419956
0.x1 0 -11.5623064
0.x2 0 2.1006948
1.(Intercept) 1 4.2020874
1.x1 1 -0.4924303
1.x2 1 1.0917668
然而,我想要的是每回归一行,以及列中的变量。我还需要包含p值和标准误差。
DUMMY interceptx1 coeffx1 p-valuex1 SEx1 coeffx2 p-valuex2 SEx2
0 22.84 -11.56 0.04 0.15 2.10 0.80 0.90
1 4.20 -0.49 0.10 0.60 1.09 0.60 1.20
知道怎么做吗?
答案 0 :(得分:2)
虽然您所需的输出是(恕我直言)并不是非常整洁的数据,但这是一种使用data.table和自定义提取功能的方法。它可以选择返回宽或长形式的结果。
提取器函数接受lm-object,并返回所有变量的估计值,p值和标准误差。
extractor <- function(model, return_wide = F){
#get datatable with coefficient, se and p-value
model_summary <- as.data.table(summary(model)$coefficients[,-3])
model_summary[,variable:=names(coef(model))]
#do some reshaping
step2 <- melt(model_summary, id.var="variable",variable.name="measure")
if(!return_wide){
return(step2)
}
step3 <- dcast(step2, 1~variable+measure,value.var="value")
return(step3)
}
演示:
res_wide <- dat[,extractor(lm(y~x1 + x2), return_wide = T), by = dummy]
> res_wide
# dummy . (Intercept)_Estimate (Intercept)_Std. Error (Intercept)_Pr(>|t|) x1_Estimate x1_Std. Error x1_Pr(>|t|) x2_Estimate x2_Std. Error x2_Pr(>|t|)
# 1: 0 . 0.04314707 0.04495702 0.3376461 -0.054364406 0.04441204 0.2214895 0.01333804 0.04620999 0.7729757
# 2: 1 . -0.04137086 0.04471550 0.3553164 0.009864255 0.04533808 0.8278539 0.05272257 0.04507189 0.2426726
res_long <- dat[,extractor(lm(y~x1 + x2)), by = dummy]
# dummy variable measure value
# 1: 0 (Intercept) Estimate 0.043147072
# 2: 0 x1 Estimate -0.054364406
# 3: 0 x2 Estimate 0.013338043
# 4: 0 (Intercept) Std. Error 0.044957023
# 5: 0 x1 Std. Error 0.044412037
# 6: 0 x2 Std. Error 0.046209987
# 7: 0 (Intercept) Pr(>|t|) 0.337646052
# 8: 0 x1 Pr(>|t|) 0.221489530
使用的数据:
library(data.table)
set.seed(123)
nobs = 1000
dat <- data.table(
dummy = sample(0:1,nobs,T),
x1 = rnorm(nobs),
x2 = rnorm(nobs),
y = rnorm(nobs))