根据组的特定成员过滤组的所有行

时间:2017-10-26 22:27:25

标签: r

我想根据指定行的值过滤整个群组

在下面的数据中,我想根据ID Metric的值删除<{1}}组的所有行。 (请注意,我尝试根据这两个条件进行过滤,我尝试根据一个条件进行过滤,但是在特定的行)

示例数据:

Hour == '2'

我想根据ID <- c('A','A','A','A','A','B','B','B','B','C','C') Hour <- c('0','2','5','6','9','0','2','5','6','0','2') Metric <- c(3,4,1,6,7,8,8,3,6,1,1) x <- data.frame(ID, Hour, Metric) ID Hour Metric 1 A 0 3 2 A 2 4 3 A 5 1 4 A 6 6 5 A 9 7 6 B 0 8 7 B 2 8 8 B 5 3 9 B 6 6 10 C 0 1 11 C 2 1 的{​​{1}}来过滤每个ID。结果应如下所示(Metric > 5 B的所有行都被删除):

Hour == '2'

首选基于dplyr的解决方案,但非常感谢任何帮助。

2 个答案:

答案 0 :(得分:4)

改编How to filter (with dplyr) for all values of a group if variable limit is reached?

我们得到:

x %>%
    group_by(ID) %>%
    filter(any(Metric[Hour == '2'] <= 5))

# # A tibble: 7 x 3
# # Groups:   ID [2]
# ID   Hour Metric
# <fctr> <fctr>  <dbl>
#     1      A      0      3
# 2      A      2      4
# 3      A      5      1
# 4      A      6      6
# 5      A      9      7
# 6      C      0      1
# 7      C      2      1

这些类型的问题也可以通过首先创建一个by group中间变量来解决,以标记是否应该删除行。

方法1:

x %>%
    group_by(ID) %>%
    mutate(keep_group = (any(Metric[Hour == '2'] <= 5))) %>%
    ungroup %>%
    filter(keep_group) %>%
    select(-keep_group)

方法2:

groups_to_keep <-
    x %>%
    filter(Hour == '2', Metric <= 5) %>%
    select(ID) %>%
    distinct() # N.B. this sorts groups_to_keep by ID which may not be desired
#    ID
# 1  A
# 2  C

x %>%
    inner_join(groups_to_keep, by = 'ID')
#    ID Hour Metric
# 1  A    0      3
# 2  A    2      4
# 3  A    5      1
# 4  A    6      6
# 5  A    9      7
# 6  C    0      1
# 7  C    2      1

方法3 - 正如@thelatemail所建议的那样(对于ID中的重复项是安全的):

groups_not_to_keep <-
    x %>% 
    filter(Hour == 2, Metric > 5) %>% 
    select(ID)

x %>%
    anti_join(groups_not_to_keep, by = 'ID')

答案 1 :(得分:2)

不在(!())这里应该有用。试试这个

library(dplyr)
filter(x, Metric > 5 & Hour == '2')$ID # gives B
subset(x, !(ID  %in% filter(x, Metric > 5 & Hour == '2')$ID))