我有一个数据框:
test_df <- data.frame(
x = c(rep("a", 5), rep("b", 5)),
y = c(1, 2, NA, 2, 3, NA, 1, 2, 3, 1)
)
我想通过第x列中的分组信息删除 y == 2之后的所有行。有没有办法在dplyr
?
我想要的结果是 从:
x y
1 a 1
2 a 2
3 a NA
4 a 2
5 a 3
6 b NA
7 b 1
8 b 2
9 b 3
10 b 1
要
x y
1 a 1
2 a 2
6 b NA
7 b 1
8 b 2
答案 0 :(得分:7)
这是怎么回事?
group_by(test_df, x) %>% slice(seq_len(min(which(y == 2))))
Source: local data frame [5 x 2]
Groups: x [2]
x y
(fctr) (dbl)
1 a 1
2 a 2
3 b NA
4 b 1
5 b 2
答案 1 :(得分:4)
group_by(df, x) %>%
mutate(first2 = min(which(y == 2 | row_number() == n()))) %>%
filter(row_number() <= first2) %>%
select(-first2)
# Source: local data frame [5 x 2]
# Groups: x [2]
#
# x y
# (fctr) (int)
# 1 a 1
# 2 a 2
# 3 b NA
# 4 b 1
# 5 b 2
# 6 c 1
使用此数据
df = structure(list(x = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), y = c(1L, 2L,
NA, 2L, 3L, NA, 1L, 2L, 3L, 1L, 1L)), .Names = c("x", "y"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
答案 2 :(得分:0)
@DatamineR的解决方案给了我一个错误,因为我有一些y永远不等于2的组。我通过将n()
置于min()
调用内来修改它,现在它在y时保留所有行在一组中不等于2。
test_df <- data.frame(
x = c(rep("a", 5), rep("b", 5)),
y = c(1, 2, NA, 2, 3, NA, 1, 3, 3, 1)
)
group_by(test_df, x) %>% slice(seq_len(min(which(y == 2), n())))
# A tibble: 7 x 2
# Groups: x [2]
x y
<fct> <dbl>
1 a 1.00
2 a 2.00
3 b NA
4 b 1.00
5 b 3.00
6 b 3.00
7 b 1.00