R-在不同数据帧中添加(添加)列的最佳方法

时间:2012-06-08 00:10:35

标签: r

我有三个不同的数据框如下:

V1.x<-c(1,2,3,4,5)
V2.x<-c(2,2,7,3,1)
V3.x<-c(2,4,3,2,9)
D1<-data.frame(ID=c("A","B","C","D","E"),V1.x=V1.x,V2.x=V2.x,V3.x=V3.x)

V1.y<-c(2,3,3,3,5)
V2.y<-c(1,2,3,3,5)
V3.y<-c(6,4,3,2,2)
D2<-data.frame(ID=c("A","B","C","D","E"),V1.y=V1.y,V2.y=V2.y,V3.y=V3.y)

V1<-c(3,2,4,4,5)
V2<-c(3,7,3,4,5)
V3<-c(5,4,3,6,3)
D3<-data.frame(ID=c("A","B","C","D","E"),V1=V1,V2=V2,V3=V3)

我想添加所有V1列,所有V2列和所有V3列

V1_Add<-D1$V1.x+D2$V1.y+D3$V1
V2_Add<-D1$V2.x+D2$V2.y+D3$V2
V3_Add<-D1$V3.x+D2$V3.y+D3$V3

可以很好地获得单个列的总和,但在实际数据中,列号从V1:V80开始,因此不必单独输入每个列。另外,我希望最终得到一个包含所有最终总和的数据框,如下所示:

  ID  V1  V2  V3
1  A  6  6   13
2  B  7  11  12
3  C  10 13  9
4  D  11 10  10
5  E  15 11  14

4 个答案:

答案 0 :(得分:2)

这是你想要的吗?

D.Add <- data.frame(D1[,1],(D1[,-1]+D2[,-1]+D3[,-1]))
colnames(D.Add)<-colnames(D3)

答案 1 :(得分:2)

library(reshape2)
library(plyr)

# First let's standardize column names after ID so they become V1 through Vx. 
# I turned it into a function to make this easy to do for multiple data.frames
standardize_col_names <- function(df) {
# First column remains ID, then I name the remaining V1 through Vn-1 
# (since first column is taken up by the ID)
names(df) <- c("ID", paste("V",1:(dim(df)[2]-1),sep=""))
return(df)
}

D1 <- standardize_col_names(D1)
D2 <- standardize_col_names(D2)
D3 <- standardize_col_names(D3)

# Next, we melt the data and bind them into the same data.frame
# See one example with melt(D1, id.vars=1). I just used rbind to combine those
melted_data <- rbind(melt(D1, id.vars=1), melt(D2, id.vars=1), melt(D3, id.vars=1))
# note that the above step can be folded into the function as well. 
# Then you throw all the data.frames into a list and ldply through this function.

# Finally, we cast the data into what you need which is the sum of the columns
 dcast(melted_data, ID~variable, sum)
  ID V1 V2 V3
1  A  6  6 13
2  B  7 11 12
3  C 10 13  9
4  D 11 10 10
5  E 15 11 14



 # Combined everything above more efficiently :

   standardize_df <- function(df) {
    names(df) <- c("ID", paste("V",1:(dim(df)[2]-1),sep=""))
    return(melt(df, id.vars = 1))
    }

   all_my_data <- list(D1,D2,D3)
   melted_data <- ldply(all_my_data, standardize_df)
   summarized_data <- dcast(melted_data, ID~variable, sum)

答案 2 :(得分:2)

这种方法可能有点过分,但对任意数量的列和任意数量的“索引”列也应该是相当普遍的。它确实假设您的所有data.frames具有相同的列数,并且它们的顺序正确。首先,从所有data.frames中创建一个列表对象。我引用this question以编程方式执行此操作。

ClassFilter <- function(x, class) inherits(get(x), "data.frame")
Objs <- Filter( ClassFilter, ls() )
Objs <- lapply(Objs, "get")

接下来,我编写了一个函数,使用Reduce将所有数字列一起添加,然后将其与最后的非数字列拼接在一起:

FUN <- function(x){
  colsToProcess <- lapply(x, function(y) y[, unlist(sapply(y, is.numeric))])
  result <- Reduce("+", colsToProcess)
  #Get the non numeric columns
  nonNumericCols <- x[[1]]  
  nonNumericCols <- nonNumericCols[, !(unlist(sapply(nonNumericCols, is.numeric)))]
  return(data.frame(Index = nonNumericCols, result))
}

最后,在行动中:

> FUN(Objs)
  Index V1.x V2.x V3.x
1     A    6    6   13
2     B    7   11   12
3     C   10   13    9
4     D   11   10   10
5     E   15   11   14

答案 3 :(得分:0)

怎么样,只是把整个块加起来? :

D1[,2:4] + D3[,2:4] + D2[,2:4]

......导致......

  V1.x V2.x V3.x
1    6    6   13
2    7   11   12
3   10   13    9
4   11   10   10
5   15   11   14

它假定所有变量的顺序相同,否则应该可以正常工作。