合并多个数据帧的所有可能组合

时间:2016-05-24 11:43:37

标签: r loops merge

我想按列合并这三个数据帧的所有可能的对组合(即九种组合)

frame1 = data.frame(a=c(1,2,3), b=c(1,2,3), c=c(1,2,3))
frame2 = data.frame(a=c(2,1,3), b=c(2,1,3), c=c(2,1,3))
frame3 = data.frame(a=c(3,2,1), b=c(3,2,1), c=c(3,2,1))

包含相同的3行但不是以相同的顺序,所以我还希望合并是合并的两个文件中的列a和b的值对的重合。例如:

a b c
1 1 1 
2 2 2
3 3 3

+

a b c
2 2 2
1 1 1   
3 3 3

=

a.x b.x c.x a.y b.y c.y
1 1 1 1 1 1  
2 2 2 2 2 2
3 3 3 3 3 3

然后,我希望获得每个合并文件中存在的列cx和cy的每对值之间的差异,绝对值,并将所有这些差异相加,从而获得“得分”(当然,这将是零这个例子),我想在对应的单元格中添加一个空矩阵3x3(即,frame1与第2帧的分数应该位于单元格[2,1]等中):

nframes = 3
frames = c(frame1,frame2,frame3)

matrix = matrix(, nrow = nframes, ncol = nframes)
matrix_scores = data.frame(matrix)

for (i in frames){
  for (j in frames)
   {
    x = merge(i, j, by=c("a","b"))
    score = sum(abs(x$c.x - x$c.y))
    matrix_scores[j,i] <- score
  }
}

但是,当我运行循环时,我获得以下消息:

Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns

另外,我明白了这行

matrix_scores[j,i] <- score

也会出错,但我不知道如何表达我希望得分存储在单元格[1,1]中,用于循环的第一次迭代(frame1与frame1)。

得到的矩阵应该是一个包含全零的3x3矩阵:

       f1 f2 f3
frame1 0 0 0
frame2 0 0 0
frame3 0 0 0

1 个答案:

答案 0 :(得分:0)

你可以这样做:

# Put all frames in a list
d <- list(frame1, frame2, frame3)
# get all merge-combinations
gr <- expand.grid(1:length(d), 1:length(d))

# function to merge and get the sum diff:
foo <- function(i, x, gr){
  tmp <- merge(x[[gr[i, 1]]], x[[gr[i, 2]]], by=c("a", "b"))
  sum(abs(tmp$c.x - tmp$c.y))
}

# result matrix
matrix(sapply(1:nrow(gr), foo, d, gr), length(d), length(d),  byrow = T)
      [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]    0    0    0

# The scores are set as followed:
matrix(apply(gr, 1, paste, collapse="_"), 3, 3,  byrow = T)
      [,1]  [,2]  [,3] 
[1,] "1_1" "2_1" "3_1"
[2,] "1_2" "2_2" "3_2"
[3,] "1_3" "2_3" "3_3"


# alternative using apply:

# function to merge and get the sum diff:
foo <- function(y, x){
  tmp <- merge(x[[ y[1] ]], x[[ y[2] ]], by=c("a", "b"))
  sum(abs(tmp$c.x - tmp$c.y))
}
# result matrix
matrix(apply(gr, 1, foo, d), length(d), length(d),  byrow = T)