Question

我想从我的data.frame中删除行，这些行在数据框中没有重复＆gt; = 4次的唯一值组合。在这个例子中，我只想要行1,2,6和7，因为值IR，IR_OSR，2＆amp;你好在这个例子中重复了4次。

> DB[1:5,c("MegaSite","General.location","ID","call.type")]
  MegaSite General.location ID call.type
1       IR           IR_OSR  2     hello
2       IR           IR_OSR  2     hello
3       IR           IR_OSR  M         x
4       IR           IR_OSR  M         x
5       IR           IR_OSR  M         z
6       IR           IR_OSR  2     hello
7       IR           IR_OSR  2     hello
        > dim(DB)
[1] 25434    76

我在最近的另一个问题（Finding value pairs that occur more than once in a data.table in R）中尝试了以下代码，

>DB[,.N>3 , list("MegaSite","General.location","ID","call.type")]

然而我收到此错误

Error in drop && !has.j : invalid 'x' type in 'x && y'

这是指向更大的示例数据集的链接，该数据集仅包含来自我的实际数据集的相关列： DB_IRsample.txt

Answer 1

试试这段代码：

> require(plyr)
> result <- ddply(r,.(MegaSite,General.location,ID,call.type),nrow)
> result <- result[result$V1 >= 4, ]
> result
  MegaSite General.location ID call.type V1
1       IR           IR_OSR  2     hello  4

然后，您可以将原始数据与此result合并，以过滤掉至少4次未出现的行：

> merge(r, result, all.y=TRUE, by=c("MegaSite", "General.location", "ID", "call.type"))
  MegaSite General.location ID call.type V1
1       IR           IR_OSR  2     hello  4
2       IR           IR_OSR  2     hello  4
3       IR           IR_OSR  2     hello  4
4       IR           IR_OSR  2     hello  4

子集data.frame基于阈值数值组合

1 个答案: