Question

我有一个250000+行的数据集。

三列：country，test和test_result（字符，字符，数字）

下一行代码将我的数据减少到102388行。

sub.df1 <- df <- df[!duplicated(df), ]

这行代码将我的数据减少到102339行。

sub.df2 <- unique(df[,c('country','test')])

现在我想看到这49行。这些行包含相同的国家/地区和测试但具有不同的test_result。（在sub.df1中）

我试图减去sub.df1 [1：2] - sub.df2 = sub.df3 这里sub.df2是country和test的49种组合，它们在sub.df1中出现的次数超过一次。

还尝试了一些其他方法来实现我的目标; merge（），match（），table（），rle（），但没有一个听起来适合我的问题。

亲切的问候，布莱希特

Answer 1

如果您只想获得差异，可以使用duplicated。

df[duplicated(df[, c('country', 'test')]), ]

如果您想获得所有重复项，也可以使用例如data.table。

require(data.table)
setDT(df)
setkeyv(df, c('country', 'test'))
df[df[duplicated(df[, list(country, test)]), list(country, test)], ]