如何使用dplyr按组删除某个点之后的所有行?

时间:2016-05-02 19:16:59

标签: r dplyr

我有一个数据框:

test_df <- data.frame(
  x = c(rep("a", 5), rep("b", 5)), 
  y = c(1, 2, NA, 2, 3, NA, 1, 2, 3, 1)
)

我想通过第x列中的分组信息删除 y == 2之后的所有行。有没有办法在dplyr

中执行此操作

我想要的结果是 从:

   x  y
1  a  1
2  a  2
3  a NA
4  a  2
5  a  3
6  b NA
7  b  1
8  b  2
9  b  3
10 b  1

   x  y
1  a  1
2  a  2
6  b NA
7  b  1
8  b  2

3 个答案:

答案 0 :(得分:7)

这是怎么回事?

group_by(test_df, x) %>% slice(seq_len(min(which(y == 2))))
Source: local data frame [5 x 2]
Groups: x [2]

       x     y
  (fctr) (dbl)
1      a     1
2      a     2
3      b    NA
4      b     1
5      b     2

答案 1 :(得分:4)

group_by(df, x) %>%
    mutate(first2 = min(which(y == 2 | row_number() == n()))) %>%
    filter(row_number() <= first2) %>%
    select(-first2)
# Source: local data frame [5 x 2]
# Groups: x [2]
# 
#        x     y
#   (fctr) (int)
# 1      a     1
# 2      a     2
# 3      b    NA
# 4      b     1
# 5      b     2
# 6      c     1

使用此数据

df = structure(list(x = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), y = c(1L, 2L, 
NA, 2L, 3L, NA, 1L, 2L, 3L, 1L, 1L)), .Names = c("x", "y"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))

答案 2 :(得分:0)

@DatamineR的解决方案给了我一个错误,因为我有一些y永远不等于2的组。我通过将n()置于min()调用内来修改它,现在它在y时保留所有行在一组中不等于2。

test_df <- data.frame(
x = c(rep("a", 5), rep("b", 5)), 
y = c(1, 2, NA, 2, 3, NA, 1, 3, 3, 1)
)

group_by(test_df, x) %>% slice(seq_len(min(which(y == 2), n())))

# A tibble: 7 x 2
# Groups:   x [2]
  x         y
  <fct> <dbl>
1 a      1.00
2 a      2.00
3 b     NA   
4 b      1.00
5 b      3.00
6 b      3.00
7 b      1.00