在data.frame 2中查找data.frame 1但没有的行

时间:2013-10-22 16:35:35

标签: r dataframe diff

我有一个data.frame(Data)和这个data.frame(Data2)的子集

set.seed(1)
Data <- data.frame(id = seq(1, 10), 
  Diag1 = sample(c("A123", "B123", "C123"), 10, replace = TRUE), 
  Diag2 = sample(c("D123", "E123", "F123"), 10, replace = TRUE), 
  Diag3 = sample(c("G123", "H123", "I123"), 10, replace = TRUE), 
  Diag4 = sample(c("A123", "B123", "C123"), 10, replace = TRUE), 
  Diag5 = sample(c("J123", "K123", "L123"), 10, replace = TRUE), 
  Diag6 = sample(c("M123", "N123", "O123"), 10, replace = TRUE), 
  Diag7 = sample(c("P123", "Q123", "R123"), 10, replace = TRUE))

Data2 <- Data[1:4,]

如何获得两个data.frames的“差异”? 我正在寻找数据但不在Data2中的行。

我认为这样的事情 数据[!数据2] 应该有效,但事实并非如此。

谢谢!

3 个答案:

答案 0 :(得分:5)

我认为您在data.table上使用data.frame构造。这应该适用 -

library(data.table)
Data <- data.table(Data)
Data2 <- data.table(Data2)

setkeyv(Data,colnames(Data))
setkeyv(Data2,colnames(Data2))

Data[!Data2]

答案 1 :(得分:4)

data.table键是你(最好的!)朋友

library(data.table)

Data  <- as.data.table(Data)
Data2 <- as.data.table(Data2)

## set whichever cols make sense as keys
setkey(Data, Diag1, Diag2, Diag3)  
## or to set all columns as key, use  
#  setkey(Data)

## Same key for Data2
setkey(Data2, Diag1, Diag2, Diag3)  
## or 
# setkeyv(Data2, key(Data))  # <~ Note: Use setkeyv for strings


Data[!.(Data2)]

   id Diag1 Diag2 Diag3 Diag4 Diag5 Diag6 Diag7
1:  5  A123  F123  G123  C123  K123  M123  Q123
2: 10  A123  F123  H123  B123  L123  N123  R123
3:  9  B123  E123  I123  C123  L123  N123  P123
4:  6  C123  E123  H123  C123  L123  M123  P123
5:  7  C123  F123  G123  C123  J123  M123  Q123

答案 2 :(得分:1)

这将解决您的确切问题,但可以使用count

中的plyr函数进行推广
library(plyr)
df <- as.data.frame(rbind(Data, Data2)) # rbind data sets
df <- count(df, vars = names(df))       # count frequency of rows
subset(df, freq < 2)                    # subset the data.frame when freq < 2