我需要合并五个大约60列的数据帧。它们各自具有相同的列,我将它们与它们的手段相结合,因为它们代表相同的值。问题不在于组合它们的能力,而是有效地进行。以下是示例数据/代码:
#reproducible random data
set.seed(123)
dat1 <- data.frame( a = rnorm(16), b = rnorm(16), c = rnorm(16), d = rnorm(16), e = rnorm(16), f = rnorm(16))
dat2 <- data.frame( a = rnorm(16), b = rnorm(16), c = rnorm(16), d = rnorm(16), e = rnorm(16), f = rnorm(16))
dat3 <- data.frame( a = rnorm(16), b = rnorm(16), c = rnorm(16), d = rnorm(16), e = rnorm(16), f = rnorm(16))
#This works but is inefficient
final_data<-data.frame(a=rowMeans(cbind(dat1$a,dat2$a,dat3$a)),
b=rowMeans(cbind(dat1$b,dat2$b,dat3$b)),
c=rowMeans(cbind(dat1$c,dat2$c,dat3$c)),
d=rowMeans(cbind(dat1$d,dat2$d,dat3$d)),
e=rowMeans(cbind(dat1$e,dat2$e,dat3$e)),
f=rowMeans(cbind(dat1$f,dat2$f,dat3$f))
)
#what results should look like
head(final_data)
# a b c d e f
# 1 0.573813625 0.17695841 -0.1434628 -0.53673101 0.353906578 0.24262067
# 2 0.135689926 -0.69206908 0.2888584 -0.37215810 -0.038298083 -0.23317107
# 3 0.004068807 0.44666945 0.5205118 0.09587453 -0.308528454 0.30516883
# 4 0.347100292 0.02401646 0.1409754 -0.15931120 0.587047386 -0.08684867
# 5 0.006529998 0.09010946 0.4932670 0.62606230 -0.005235813 -0.36967000
# 6 0.240225778 -0.45824825 -0.5000004 0.66131121 0.619480608 0.55650611
这里的问题是我不想为新数据框中的60列中的每一列重写a=rowMeans(cbind(dat1$a,dat2$a,dat3$a))
。你能想出一个很好的方法来解决这个问题吗?
编辑:我将接受以下答案,因为它允许我设置列以应用它 -
final_data1<-as.data.frame(sapply(colnames(dat1),function(i)
rowMeans(cbind(dat1[,i],dat2[,i],dat3[,i]))))
> identical(final_data1,final_data)
[1] TRUE
答案 0 :(得分:3)
我会使用rbind
将所有数据集合并到一个数据集中,然后使用data.table
计算列数(用于速度)
library(data.table)
df <- rbind(dat1, dat2, dat3)
indx <- seq_len(nrow(df)) %% nrow(dat1)
setDT(df)[, lapply(.SD, mean), by = indx]
这种方法最好的一点是,一旦所有数据集合在一个数据集中,您就可以计算各种函数(不只是mean
)而无需每次调用cbind
。使用.SDcols
参数在特定列上运行操作也很容易,例如
cols <- names(df)[c(1,3:4)]
df[, lapply(.SD, mean), .SDcols = cols, by = indx]
答案 1 :(得分:3)
这个怎么样?
(dat1+dat2+dat3)/3
或者,要首先选择/重新排序列的子集,然后然后添加生成的data.frames,您可以这样做:
jj <- letters[1:6]
Reduce(`+`, lapply(list(dat1,dat2,dat3), `[`, jj))/3
答案 2 :(得分:2)
试试这个:
sapply(colnames(dat1),function(i)
rowMeans(cbind(dat1[,i],dat2[,i],dat3[,i])))
答案 3 :(得分:1)
您也可以尝试:
mapply(function(x,y,z) rowMeans(cbind(x,y,z)), dat1, dat2, dat3)
答案 4 :(得分:1)
以下是另一项试验。
lst <- list(dat1, dat2, dat3)
bind <- do.call(cbind, lst)
sapply(colnames(dat1), function(x) {
rowMeans(bind[, colnames(bind) == x])
})
a b c d e f
[1,] -0.69651939 -0.43495675 0.267416865 0.48329853 0.61255811 -1.505583996
[2,] -0.07074860 0.09862994 -0.003961269 0.73806156 -0.80865458 -1.367104216
[3,] -0.90342272 -0.62873624 0.260394162 -0.28607083 1.10855838 -1.073984557
[4,] -0.05890636 0.81463842 -0.227212609 0.21552260 -0.20440539 -0.071603144
[5,] 0.34237648 0.11332086 -0.673674065 -0.17747223 0.21157555 0.641724519
[6,] -0.15563697 -0.10291304 0.334530993 -0.42936296 0.16148849 0.635475661
[7,] 0.05404325 1.36754458 -0.375816720 0.20686341 0.78680115 0.553046376
[8,] -0.73117177 0.92057378 0.501956982 0.70190124 0.69835069 0.350644246
[9,] 0.17803759 0.04951559 -1.098479453 -0.26502658 -0.61354619 1.027449014
[10,] -0.48196619 0.11175892 -0.179521990 -0.75229105 0.31444472 0.083272675
[11,] -0.32993871 -0.01253952 -0.585723144 0.70656176 -0.32358449 -0.252437496
[12,] -0.96078171 1.44073015 0.221025206 0.30641093 -0.89929299 0.005243541
[13,] 0.03855730 -0.07904409 0.579366082 0.87307855 0.08949804 0.023818143
[14,] -0.28243416 0.68603908 -0.046795603 -0.09192619 0.26275774 0.594420728
[15,] -0.83591175 -0.62040012 0.598931246 -0.22719000 0.50836421 -0.135153053
[16,] -0.55951822 0.42339116 0.162560131 -0.08010072 0.79547162 -0.334898253