我需要比较两个数据帧并生成比较结果的输出。尺寸相同。列和行顺序匹配。我想比较两个数据帧之间的每个相应单元格,并确定它们是包含相同的值还是不同的值。如果值不同,我需要知道两个值是否属于我定义的特定向量,或者它们是否来自2个不同的向量。我在下面提供了示例代码。
我无法在论坛中发现任何我需要的东西,主要是因为我需要知道值何时不同,它们根据我提供的标准有多么不同。
#Possible Value Types for the Data Frames
typeA = c("Green", "Blue", "Purple")
typeB = c("Red", "Orange", "Yellow")
#Create Data Frames to Compare
df1 = as.data.frame(cbind(rbind("Green","Red","Yellow"),
rbind("Green", "Purple", "Red"),
rbind("Orange", "Orange",NA),
rbind(NA,"Red","Purple")))
df2 = as.data.frame(cbind(rbind("Green","Red","Yellow"),
rbind(NA, "Purple", "Yellow"),
rbind("Blue", "Orange",NA),
rbind("Blue","Red","Green")))
#Data frames compared must have identical dimensions
###INSERT FUNCTION HERE
myfunction = function(df1,df2){
#compare corresponding cells and provide output based on match
#example: compare cell df1[1,1] to df2[1,1]
#if either df1[1,1] or df2[1,1] is NA then return NA, else...
#if df1[1,1] matches df2[1,1] then return "Match"
#if df1[1,1] does not match df2[1,1] but they are both in vector typeB then return "SAMEGROUP"
#if df1[1,1] does not match df2[1,1] and one is in vector typeA and the other in typeB then return "DIFFGROUP"
}
###RUN FUNCTION
df.out = myfunction(df1,df2)
#expected output
#Match: The values in df1 and df2 for that cell are identical
#SAMEGROUP: The values in df1 and df2 for that cell are different, but
##they come from the same group (typeA or typeB)
#DIFFGROUP: The values in df1 and df2 for that cell are different, and
##they come from different groups (one from typeA, one from typeB)
#NA: One or both of the corresponding cells in df1 or df2 has an NA
df.out = as.data.frame(cbind(rbind("Match","Match","Match"),
rbind(NA, "Match", "SAMEGROUP"),
rbind("DIFFGROUP", "Match",NA),
rbind(NA,"Match","SAMEGROUP")))
谢谢!
答案 0 :(得分:2)
谢谢jarfa的建议。这让我走上正轨。这就行了。
df1 = as.matrix(df1)
df2 = as.matrix(df2)
#ifelse(df1==df2, "match","diff") #test
ifelse(df1==df2, "Match",
ifelse(df1 %in% typeA & df2 %in% typeA,"SAMEGROUP",
ifelse(df1 %in% typeB & df2 %in% typeB, "SAMEGROUP",
ifelse(df1 %in% typeA & df2 %in% typeB,"DIFFGROUP",
ifelse(df1 %in% typeB & df2 %in% typeA, "DIFFGROUP","TRYAGAIN")))))
答案 1 :(得分:1)
首先,要强制执行维度条件:
stopifnot(all.equal(dim(df1), dim(df2)))
对于你的功能:一个天真,缓慢的方法将是这样的:
for(i in 1:dim(df1)[1])
for(j in 1:dim(df1)[2])
#complicated ifelse statement(s)
但这很容易被矢量化。参见:
a = matrix(1:9, 3)
b = matrix(c(1:8, -1),3)
ifelse(a == b, 'match', 'nomatch')
你的if / else肯定会更复杂,但我想你可以从中找出答案。它将是一些嵌套的ifelse()函数
编辑:创建一个将返回给定值组的函数。然后,声明
groupfun(a) == groupfun(b)
应该只返回TRUES和FALSES矩阵,这将很容易使用。