我想从我的data.frame中删除行,这些行在数据框中没有重复> = 4次的唯一值组合。在这个例子中,我只想要行1,2,6和7,因为值IR,IR_OSR,2&你好在这个例子中重复了4次。
> DB[1:5,c("MegaSite","General.location","ID","call.type")]
MegaSite General.location ID call.type
1 IR IR_OSR 2 hello
2 IR IR_OSR 2 hello
3 IR IR_OSR M x
4 IR IR_OSR M x
5 IR IR_OSR M z
6 IR IR_OSR 2 hello
7 IR IR_OSR 2 hello
> dim(DB)
[1] 25434 76
我在最近的另一个问题(Finding value pairs that occur more than once in a data.table in R)中尝试了以下代码,
>DB[,.N>3 , list("MegaSite","General.location","ID","call.type")]
然而我收到此错误
Error in drop && !has.j : invalid 'x' type in 'x && y'
这是指向更大的示例数据集的链接,该数据集仅包含来自我的实际数据集的相关列: DB_IRsample.txt
答案 0 :(得分:1)
试试这段代码:
> require(plyr)
> result <- ddply(r,.(MegaSite,General.location,ID,call.type),nrow)
> result <- result[result$V1 >= 4, ]
> result
MegaSite General.location ID call.type V1
1 IR IR_OSR 2 hello 4
然后,您可以将原始数据与此result
合并,以过滤掉至少4次未出现的行:
> merge(r, result, all.y=TRUE, by=c("MegaSite", "General.location", "ID", "call.type"))
MegaSite General.location ID call.type V1
1 IR IR_OSR 2 hello 4
2 IR IR_OSR 2 hello 4
3 IR IR_OSR 2 hello 4
4 IR IR_OSR 2 hello 4