我有一个data.frame(Data)和这个data.frame(Data2)的子集
set.seed(1)
Data <- data.frame(id = seq(1, 10),
Diag1 = sample(c("A123", "B123", "C123"), 10, replace = TRUE),
Diag2 = sample(c("D123", "E123", "F123"), 10, replace = TRUE),
Diag3 = sample(c("G123", "H123", "I123"), 10, replace = TRUE),
Diag4 = sample(c("A123", "B123", "C123"), 10, replace = TRUE),
Diag5 = sample(c("J123", "K123", "L123"), 10, replace = TRUE),
Diag6 = sample(c("M123", "N123", "O123"), 10, replace = TRUE),
Diag7 = sample(c("P123", "Q123", "R123"), 10, replace = TRUE))
Data2 <- Data[1:4,]
如何获得两个data.frames的“差异”? 我正在寻找数据但不在Data2中的行。
我认为这样的事情 数据[!数据2] 应该有效,但事实并非如此。谢谢!
答案 0 :(得分:5)
我认为您在data.table
上使用data.frame
构造。这应该适用 -
library(data.table)
Data <- data.table(Data)
Data2 <- data.table(Data2)
setkeyv(Data,colnames(Data))
setkeyv(Data2,colnames(Data2))
Data[!Data2]
答案 1 :(得分:4)
data.table键是你(最好的!)朋友
library(data.table)
Data <- as.data.table(Data)
Data2 <- as.data.table(Data2)
## set whichever cols make sense as keys
setkey(Data, Diag1, Diag2, Diag3)
## or to set all columns as key, use
# setkey(Data)
## Same key for Data2
setkey(Data2, Diag1, Diag2, Diag3)
## or
# setkeyv(Data2, key(Data)) # <~ Note: Use setkeyv for strings
Data[!.(Data2)]
id Diag1 Diag2 Diag3 Diag4 Diag5 Diag6 Diag7
1: 5 A123 F123 G123 C123 K123 M123 Q123
2: 10 A123 F123 H123 B123 L123 N123 R123
3: 9 B123 E123 I123 C123 L123 N123 P123
4: 6 C123 E123 H123 C123 L123 M123 P123
5: 7 C123 F123 G123 C123 J123 M123 Q123
答案 2 :(得分:1)
这将解决您的确切问题,但可以使用count
plyr
函数进行推广
library(plyr)
df <- as.data.frame(rbind(Data, Data2)) # rbind data sets
df <- count(df, vars = names(df)) # count frequency of rows
subset(df, freq < 2) # subset the data.frame when freq < 2