我想要比较2个数据框。 (我已经在这里提出了这个问题,但是为了提高效率,我的措辞不同:How to find differences in elements of 2 data frames based on 2 unique identifiers)
df1<-data.frame(DS.ID=c(123,214,543,325,123,214),OP.ID=c("xxab","xxac","xxad","xxae","xxaf","xxaq"),P.ID=c("AAC","JGK","DIF","ADL","AAC","JGR"),date="20121111")
> df1
DS.ID OP.ID P.ID date
1 123 xxab AAC 20121111
2 214 xxac JGK 20121111
3 543 xxad DIF 20121111
4 325 xxae ADL 20121111
5 123 xxaf AAC 20121111
6 214 xxaq JGR 20121111
df2<-data.frame(DS.ID=c(123,214,543,325,123,214),OP.ID=c("xxab","xxac","xxad","xxae","xxaf","xxaq"),P.ID=c("AAC","JGK","DIF","ADL","AAC","JGS"),date="20121110")
> df2
DS.ID OP.ID P.ID date
1 123 xxab AAC 20121110
2 214 xxac JGK 20121110
3 543 xxad DIF 20121110
4 325 xxae ADL 20121110
5 123 xxaf AAC 20121110
6 214 xxaq JGS 20121110
唯一ID基于DS.ID和OP.ID的组合,因此DS.ID可以重复,但DS.ID和OP.ID的组合不会。我想找到P.ID改变的实例。此外,DS.ID和OP.ID的组合不一定在同一行。
所以,首先我要创建一个数据帧,然后我想用dcast融化。我希望最终将DS.ID和OP.ID列作为唯一ID,然后两个日期的列都包含每个列的值。
df12 <- rbind.fill(df1,df2)
答案 0 :(得分:2)
如果你想要的是在P.ID存在差异时进行比较,你可以通过两个公共列merge
进行比较,然后进行比较:
# Convert from factor to character.
df1$P.ID<-as.character(df1$P.ID)
df2$P.ID<-as.character(df2$P.ID)
# Merge
compare.df<-merge(df1,df2,by=c('DS.ID','OP.ID'))
# Find differences.
compare.df[compare.df$P.ID.x!=compare.df$P.ID.y,]
# DS.ID OP.ID P.ID.x date.x P.ID.y date.y
# 4 214 xxaq JGR 20121111 JGS 20121110