比较两个DataFrame

时间:2013-11-20 14:45:39

标签: r

我有两个以下类型的数据框:

df1
#  ProbeName   Tglult
#1  PDKT_001       NA
#2  PDKT_002       NA
#3  PDKT_003 676.2108
#4  PDKT_004       NA
#5  PDKT_005 724.9720
#6  PDKT_006       NA

df2
# ProbeName    Pglult
#1  PDKT_001        NA
#2  PDKT_002        NA
#3  PDKT_003  648.9933
#4  PDKT_004        NA
#5  PDKT_005        NA
#6  PDKT_006   15.0673

我想看看哪些具有相同的ProbeName,并在第二列中有一个值(我不介意哪个是这个值)。或者哪些没有。所以这应该是预期的结果:

# [1] "PDKT_003" #Both data frames have a values for this row
#With another option get the ones which are in df1 and not df2
# [1] "PDKT_005"
#With another option get the ones which are in df2 and not in df1
# [1] "PDKT_006"

如何做到这一点?

起初我认为merge会做并尝试过:

#Which are the ones in common and the ones in the df2 that are not in df1
probe <- merge(x=df2[!is.na(df2[,2]),], y=df1[!is.na(df1[,2]),],
by.x="ProbeName", by.y="ProbeName", all.x=TRUE) 
probe[, 1]
# [1] PDKT_003  PDKT_006

#The ones in common
probe2 <- merge(x=df2[!is.na(df2[,2]),], y=df1[!is.na(df1[,2]),],
by.x="ProbeName", by.y="ProbeName")
probe2[, 1]
# [1] PDKT_003

#The ones in common
probe3 <- merge(x=df2[!is.na(df2[,2]),], y=df1[!is.na(df1[,2]),],
by.x="ProbeName", by.y="ProbeName", all.x=TRUE) 
probe3[, 1]
# [1] PDKT_003  PDKT_005

但我认为现在有更好的方式匹配

 common <- match(probe2[,1], probe3[,1])
 probe3[,1][!is.na(common)]

我不知道我是否遗漏了某些东西(我已经有一段时间了,这可能是错的)

1 个答案:

答案 0 :(得分:0)

假设 - 如您的示例中所提供的 - 每个ProbeName 都是唯一的并且在两个数据框的同一行处进行排序:

> d1 <- data.frame(a=1:6, v=c(NA,NA,1,NA,1,NA))
> d2 <- data.frame(a=1:6, v=c(NA,NA,1,NA,NA,1))

> !is.na(d1$v) & !is.na(d2$v)
[1] FALSE FALSE  TRUE FALSE FALSE FALSE

> d1[!is.na(d1$v) & !is.na(d2$v),"a"]
[1] 3

> d1[!is.na(d1$v) & is.na(d2$v),"a"]
[1] 5

> d1[is.na(d1$v) & !is.na(d2$v),"a"]
[1] 6