我有三个不同的数据框如下:
V1.x<-c(1,2,3,4,5)
V2.x<-c(2,2,7,3,1)
V3.x<-c(2,4,3,2,9)
D1<-data.frame(ID=c("A","B","C","D","E"),V1.x=V1.x,V2.x=V2.x,V3.x=V3.x)
V1.y<-c(2,3,3,3,5)
V2.y<-c(1,2,3,3,5)
V3.y<-c(6,4,3,2,2)
D2<-data.frame(ID=c("A","B","C","D","E"),V1.y=V1.y,V2.y=V2.y,V3.y=V3.y)
V1<-c(3,2,4,4,5)
V2<-c(3,7,3,4,5)
V3<-c(5,4,3,6,3)
D3<-data.frame(ID=c("A","B","C","D","E"),V1=V1,V2=V2,V3=V3)
我想添加所有V1列,所有V2列和所有V3列
V1_Add<-D1$V1.x+D2$V1.y+D3$V1
V2_Add<-D1$V2.x+D2$V2.y+D3$V2
V3_Add<-D1$V3.x+D2$V3.y+D3$V3
可以很好地获得单个列的总和,但在实际数据中,列号从V1:V80开始,因此不必单独输入每个列。另外,我希望最终得到一个包含所有最终总和的数据框,如下所示:
ID V1 V2 V3
1 A 6 6 13
2 B 7 11 12
3 C 10 13 9
4 D 11 10 10
5 E 15 11 14
答案 0 :(得分:2)
这是你想要的吗?
D.Add <- data.frame(D1[,1],(D1[,-1]+D2[,-1]+D3[,-1]))
colnames(D.Add)<-colnames(D3)
答案 1 :(得分:2)
library(reshape2)
library(plyr)
# First let's standardize column names after ID so they become V1 through Vx.
# I turned it into a function to make this easy to do for multiple data.frames
standardize_col_names <- function(df) {
# First column remains ID, then I name the remaining V1 through Vn-1
# (since first column is taken up by the ID)
names(df) <- c("ID", paste("V",1:(dim(df)[2]-1),sep=""))
return(df)
}
D1 <- standardize_col_names(D1)
D2 <- standardize_col_names(D2)
D3 <- standardize_col_names(D3)
# Next, we melt the data and bind them into the same data.frame
# See one example with melt(D1, id.vars=1). I just used rbind to combine those
melted_data <- rbind(melt(D1, id.vars=1), melt(D2, id.vars=1), melt(D3, id.vars=1))
# note that the above step can be folded into the function as well.
# Then you throw all the data.frames into a list and ldply through this function.
# Finally, we cast the data into what you need which is the sum of the columns
dcast(melted_data, ID~variable, sum)
ID V1 V2 V3
1 A 6 6 13
2 B 7 11 12
3 C 10 13 9
4 D 11 10 10
5 E 15 11 14
# Combined everything above more efficiently :
standardize_df <- function(df) {
names(df) <- c("ID", paste("V",1:(dim(df)[2]-1),sep=""))
return(melt(df, id.vars = 1))
}
all_my_data <- list(D1,D2,D3)
melted_data <- ldply(all_my_data, standardize_df)
summarized_data <- dcast(melted_data, ID~variable, sum)
答案 2 :(得分:2)
这种方法可能有点过分,但对任意数量的列和任意数量的“索引”列也应该是相当普遍的。它确实假设您的所有data.frames具有相同的列数,并且它们的顺序正确。首先,从所有data.frames中创建一个列表对象。我引用this question以编程方式执行此操作。
ClassFilter <- function(x, class) inherits(get(x), "data.frame")
Objs <- Filter( ClassFilter, ls() )
Objs <- lapply(Objs, "get")
接下来,我编写了一个函数,使用Reduce
将所有数字列一起添加,然后将其与最后的非数字列拼接在一起:
FUN <- function(x){
colsToProcess <- lapply(x, function(y) y[, unlist(sapply(y, is.numeric))])
result <- Reduce("+", colsToProcess)
#Get the non numeric columns
nonNumericCols <- x[[1]]
nonNumericCols <- nonNumericCols[, !(unlist(sapply(nonNumericCols, is.numeric)))]
return(data.frame(Index = nonNumericCols, result))
}
最后,在行动中:
> FUN(Objs)
Index V1.x V2.x V3.x
1 A 6 6 13
2 B 7 11 12
3 C 10 13 9
4 D 11 10 10
5 E 15 11 14
答案 3 :(得分:0)
怎么样,只是把整个块加起来? :
D1[,2:4] + D3[,2:4] + D2[,2:4]
......导致......
V1.x V2.x V3.x
1 6 6 13
2 7 11 12
3 10 13 9
4 11 10 10
5 15 11 14
它假定所有变量的顺序相同,否则应该可以正常工作。