R:如果在if-else语句中计数,如果count< 2则删除

时间:2016-04-05 22:43:16

标签: r if-statement delete-row

Model<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C")
Price<-c(12,14,15,13,16,36,32,24,14,15,14,11,24,31)
region<-c("W","E","E","W","W","E","E","E","E","W","W","W","E","W")
dt<-data.frame(Model,Price,region)

 Model Price region
1      A    12      W
2      A    14      E
3      A    15      E
4      A    13      W
5      A    16      W
6      B    36      E
7      B    32      E
8      B    24      E
9      B    14      E
10     B    15      W
11     C    14      W
12     C    11      W
13     C    24      E
14     C    31      W
> 

如果该模型类型中只发生一个W或E,我想要删除行。我们保留模型A的所有行。我们删除了第10行,因为模型B中只有1 W。我们还删除了第13行,因为模型C中只有1 E.

如何在R中做到这一点?我有大约20,000个观察数千种模型类型。我可能需要写一个循环。

2 个答案:

答案 0 :(得分:4)

Model<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C")
Price<-c(12,14,15,13,16,36,32,24,14,15,14,11,24,31)
region<-c("W","E","E","W","W","E","E","E","E","W","W","W","E","W")
dt<-data.frame(Model,Price,region)

这些将被删除

dt[!(duplicated(dt[, -2]) | duplicated(dt[, -2], fromLast = TRUE)), ]

#    Model Price region
# 10     B    15      W
# 13     C    24      E

这些将被保留

dt[duplicated(dt[, -2]) | duplicated(dt[, -2], fromLast = TRUE), ]

#    Model Price region
# 1      A    12      W
# 2      A    14      E
# 3      A    15      E
# 4      A    13      W
# 5      A    16      W
# 6      B    36      E
# 7      B    32      E
# 8      B    24      E
# 9      B    14      E
# 11     C    14      W
# 12     C    11      W
# 14     C    31      W

对于20k观测,近5000种模型类型

set.seed(1)
n <- 20000
dd <- data.frame(Model = sample(1:5000, n, TRUE),
                 Price = rpois(n, 15),
                 region = sample(c('E','W'), n, TRUE))

dim(dd[duplicated(dd[, -2]) | duplicated(dd[, -2], fromLast = TRUE), ])
# [1] 17289     3

如果你想要更多地控制数字,你可以使用类似下面的东西,这几乎一样快,虽然我只尝试了200k obs和10k模型。将1更改为其他数字

dim(dd[ave(as.numeric(dd$region), dd[, -2], FUN = length) > 1, ])
# [1] 17289     3

dt[ave(as.numeric(dt$region), dt[, -2], FUN = length) > 1, ]

#    Model Price region
# 1      A    12      W
# 2      A    14      E
# 3      A    15      E
# 4      A    13      W
# 5      A    16      W
# 6      B    36      E
# 7      B    32      E
# 8      B    24      E
# 9      B    14      E
# 11     C    14      W
# 12     C    11      W
# 14     C    31      W

答案 1 :(得分:1)

您可以创建一个计数器变量并按此过滤。使用 dplyr 包:

library(dplyr) dt <- dt %>% group_by(Model) %>% filter(n_distinct(region) > 1) %>% group_by(Model, region) %>% filter(n() > 1)