我想按列合并这三个数据帧的所有可能的对组合(即九种组合)
frame1 = data.frame(a=c(1,2,3), b=c(1,2,3), c=c(1,2,3))
frame2 = data.frame(a=c(2,1,3), b=c(2,1,3), c=c(2,1,3))
frame3 = data.frame(a=c(3,2,1), b=c(3,2,1), c=c(3,2,1))
包含相同的3行但不是以相同的顺序,所以我还希望合并是合并的两个文件中的列a和b的值对的重合。例如:
a b c
1 1 1
2 2 2
3 3 3
+
a b c
2 2 2
1 1 1
3 3 3
=
a.x b.x c.x a.y b.y c.y
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
然后,我希望获得每个合并文件中存在的列cx和cy的每对值之间的差异,绝对值,并将所有这些差异相加,从而获得“得分”(当然,这将是零这个例子),我想在对应的单元格中添加一个空矩阵3x3(即,frame1与第2帧的分数应该位于单元格[2,1]等中):
nframes = 3
frames = c(frame1,frame2,frame3)
matrix = matrix(, nrow = nframes, ncol = nframes)
matrix_scores = data.frame(matrix)
for (i in frames){
for (j in frames)
{
x = merge(i, j, by=c("a","b"))
score = sum(abs(x$c.x - x$c.y))
matrix_scores[j,i] <- score
}
}
但是,当我运行循环时,我获得以下消息:
Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns
另外,我明白了这行
matrix_scores[j,i] <- score
也会出错,但我不知道如何表达我希望得分存储在单元格[1,1]中,用于循环的第一次迭代(frame1与frame1)。
得到的矩阵应该是一个包含全零的3x3矩阵:
f1 f2 f3
frame1 0 0 0
frame2 0 0 0
frame3 0 0 0
答案 0 :(得分:0)
你可以这样做:
# Put all frames in a list
d <- list(frame1, frame2, frame3)
# get all merge-combinations
gr <- expand.grid(1:length(d), 1:length(d))
# function to merge and get the sum diff:
foo <- function(i, x, gr){
tmp <- merge(x[[gr[i, 1]]], x[[gr[i, 2]]], by=c("a", "b"))
sum(abs(tmp$c.x - tmp$c.y))
}
# result matrix
matrix(sapply(1:nrow(gr), foo, d, gr), length(d), length(d), byrow = T)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
# The scores are set as followed:
matrix(apply(gr, 1, paste, collapse="_"), 3, 3, byrow = T)
[,1] [,2] [,3]
[1,] "1_1" "2_1" "3_1"
[2,] "1_2" "2_2" "3_2"
[3,] "1_3" "2_3" "3_3"
# alternative using apply:
# function to merge and get the sum diff:
foo <- function(y, x){
tmp <- merge(x[[ y[1] ]], x[[ y[2] ]], by=c("a", "b"))
sum(abs(tmp$c.x - tmp$c.y))
}
# result matrix
matrix(apply(gr, 1, foo, d), length(d), length(d), byrow = T)