df <- data.frame(loc.id = rep(1:2,each = 10), threshold = rep(1:10,times = 2))
我希望在threshold
&gt; = 2时过滤掉第一行,threshold
为每个loc.id
的&gt; = 4。我这样做了:
df %>% group_by(loc.id) %>% dplyr::filter(row_number() == which.max(threshold >= 2),row_number() == which.max(threshold >= 4))
我期待这样的数据框:
loc.id threshold
1 2
1 4
2 2
2 4
但它返回一个空的数据框
答案 0 :(得分:2)
根据条件,我们可以slice
连接两个which.max
索引的行,得到unique
(如果只有阈值大于4的情况,那么两者都是条件得到相同的指数)
df %>%
group_by(loc.id) %>%
filter(any(threshold >= 2)) %>% # additional check
#slice(unique(c(which.max(threshold > 2), which.max(threshold > 4))))
# based on the expected output
slice(unique(c(which.max(threshold >= 2), which.max(threshold >= 4))))
# A tibble: 4 x 2
# Groups: loc.id [2]
# loc.id threshold
# <int> <int>
#1 1 2
#2 1 4
#3 2 2
#4 2 4
请注意,可能存在阈值中没有值大于或等于2的组。我们只能保留这些组
答案 1 :(得分:1)
如果这不是您想要的,请在名称下方指定df并使用它来过滤数据集。
with cte as
(
select time, data,
-- assign a new group number whenever it's not a zero
-- = same number for a value and following zeroes
sum(case when data = 0 then 0 else 1 end)
over (order by time desc -- start from the latest row
rows unbounded preceding) as grp
from myTable
)
select
min(time), max(time)
from cte
group by
grp -- aggregate previous zeroes with the non-zero row