我试图从ols获得分组预测,并将预测分配回原始data.frame中的列。在sas中,这可以通过以下方式完成:
proc reg data = ds;
by group_var1 group_var2;
model y= x;
output out = ds r = resd p = pred;
run;
上面的代码会正确地将预测分配给pred
变量,并将残差分配给原始数据集中正确“块”中的resd
变量。在R中,我最接近的是如下:
d <- data.frame(x = rnorm(20), y = rnorm(20), g1 = c(rep('a', 10), rep('b', 10)), g2 = rep(c(rep('c', 5), rep('d', 5)), 2))
fun <- function(d) predict(lm(y ~ x, data = d), d)
d['predict'] <- unlist(by(d, d[, c('g1', 'g2')], fun))
x y g1 g2 predict
1 -0.53089730 0.26437295 a c -0.01569909
2 -1.70298591 -0.58804638 a c -0.01960800
3 0.31134574 -0.96575392 a c -0.01289022
4 1.03821508 0.36409789 a c -0.01046612
5 -1.05180195 0.84922972 a c -0.01743631
6 0.94785058 0.16559779 a d 0.55257659
7 -0.11779401 -2.31900972 a d 0.65193420
8 -0.87618526 -2.29891776 a d -0.54113668
9 0.28450262 -0.68698073 a d -1.48699542
10 0.44388469 -1.54596297 a d -1.22664063
11 -1.15656711 1.70991300 b c -0.26021797
12 -1.26949128 -0.05968582 b c -1.67447336
13 0.08648475 -1.56257791 b c -2.68096176
14 1.16149361 -1.40203666 b c -1.14057100
15 0.86558927 -0.73587454 b c -0.92904930
16 -1.15168500 0.55817377 b d 1.34287585
17 1.17898623 -0.84767449 b d 0.12289709
18 -0.61372747 1.55786932 b d 1.06128454
19 -0.31233192 1.15423216 b d 0.90352047
20 0.61869842 1.42415426 b d 0.41617707
但是,我觉得这种方法并不健全,因为unlist
只是简单地展平输出而没有任何匹配组的方法。我确实看到了预测错误分配给其他群体的情况。是否有一种聪明而有力的方法来实现这一目标?
答案 0 :(得分:3)
R中有很多选项,例如使用plyr
library(plyr)
ddply(d,.(g1,g2),fun)
g1 g2 1 2 3 4 5
1 a c 0.4218236 -0.20147871 0.4544890 -0.2330185 0.2070659
2 a d 1.0728608 1.08735907 1.0239526 0.8740484 1.4750465
3 b c -0.1288420 0.20727212 0.2407177 0.4017079 -0.5237893
4 b d -0.6598871 -0.06777359 -0.2847552 -1.6400449 -0.2821023
或使用data.table
library(data.table)
DT <- as.data.table(d)
DT[,as.list(predict(lm(y~x))),by= c('g1', 'g2')]
g1 g2 1 2 3 4 5
1: a c 0.4218236 -0.20147871 0.4544890 -0.2330185 0.2070659
2: a d 1.0728608 1.08735907 1.0239526 0.8740484 1.4750465
3: b c -0.1288420 0.20727212 0.2407177 0.4017079 -0.5237893
4: b d -0.6598871 -0.06777359 -0.2847552 -1.6400449 -0.2821023