我有两个以下类型的数据框:
df1
# ProbeName Tglult
#1 PDKT_001 NA
#2 PDKT_002 NA
#3 PDKT_003 676.2108
#4 PDKT_004 NA
#5 PDKT_005 724.9720
#6 PDKT_006 NA
df2
# ProbeName Pglult
#1 PDKT_001 NA
#2 PDKT_002 NA
#3 PDKT_003 648.9933
#4 PDKT_004 NA
#5 PDKT_005 NA
#6 PDKT_006 15.0673
我想看看哪些具有相同的ProbeName,并在第二列中有一个值(我不介意哪个是这个值)。或者哪些没有。所以这应该是预期的结果:
# [1] "PDKT_003" #Both data frames have a values for this row
#With another option get the ones which are in df1 and not df2
# [1] "PDKT_005"
#With another option get the ones which are in df2 and not in df1
# [1] "PDKT_006"
如何做到这一点?
起初我认为merge
会做并尝试过:
#Which are the ones in common and the ones in the df2 that are not in df1
probe <- merge(x=df2[!is.na(df2[,2]),], y=df1[!is.na(df1[,2]),],
by.x="ProbeName", by.y="ProbeName", all.x=TRUE)
probe[, 1]
# [1] PDKT_003 PDKT_006
#The ones in common
probe2 <- merge(x=df2[!is.na(df2[,2]),], y=df1[!is.na(df1[,2]),],
by.x="ProbeName", by.y="ProbeName")
probe2[, 1]
# [1] PDKT_003
#The ones in common
probe3 <- merge(x=df2[!is.na(df2[,2]),], y=df1[!is.na(df1[,2]),],
by.x="ProbeName", by.y="ProbeName", all.x=TRUE)
probe3[, 1]
# [1] PDKT_003 PDKT_005
但我认为现在有更好的方式匹配
common <- match(probe2[,1], probe3[,1])
probe3[,1][!is.na(common)]
我不知道我是否遗漏了某些东西(我已经有一段时间了,这可能是错的)
答案 0 :(得分:0)
假设 - 如您的示例中所提供的 - 每个ProbeName 都是唯一的并且在两个数据框的同一行处进行排序:
> d1 <- data.frame(a=1:6, v=c(NA,NA,1,NA,1,NA))
> d2 <- data.frame(a=1:6, v=c(NA,NA,1,NA,NA,1))
> !is.na(d1$v) & !is.na(d2$v)
[1] FALSE FALSE TRUE FALSE FALSE FALSE
> d1[!is.na(d1$v) & !is.na(d2$v),"a"]
[1] 3
> d1[!is.na(d1$v) & is.na(d2$v),"a"]
[1] 5
> d1[is.na(d1$v) & !is.na(d2$v),"a"]
[1] 6