Question

df <- data.frame(loc.id = rep(1:2,each = 10), threshold = rep(1:10,times = 2))

我希望在threshold＆gt; = 2时过滤掉第一行，threshold为每个loc.id的＆gt; = 4。我这样做了：

df %>% group_by(loc.id) %>% dplyr::filter(row_number() == which.max(threshold >= 2),row_number() == which.max(threshold >= 4))

我期待这样的数据框：

      loc.id threshold
        1       2
        1       4
        2       2
        2       4

但它返回一个空的数据框

Answer 1

根据条件，我们可以slice连接两个which.max索引的行，得到unique（如果只有阈值大于4的情况，那么两者都是条件得到相同的指数）

df %>%
    group_by(loc.id) %>%
    filter(any(threshold >= 2)) %>% # additional check
    #slice(unique(c(which.max(threshold > 2), which.max(threshold > 4))))
    # based on the expected output
    slice(unique(c(which.max(threshold >= 2), which.max(threshold >= 4))))
# A tibble: 4 x 2
# Groups:   loc.id [2]
#  loc.id threshold
#   <int>     <int>
#1      1         2
#2      1         4
#3      2         2
#4      2         4

请注意，可能存在阈值中没有值大于或等于2的组。我们只能保留这些组

Answer 2

如果这不是您想要的，请在名称下方指定df并使用它来过滤数据集。

with cte as 
 (
   select time, data,
      -- assign a new group number whenever it's not a zero
      -- = same number for a value and following zeroes
      sum(case when data = 0 then 0 else 1 end) 
          over (order by time desc -- start from the latest row
                rows unbounded preceding) as grp 
   from myTable
 ) 
select
   min(time), max(time)
from cte
group by 
   grp -- aggregate previous zeroes with the non-zero row

使用dplyr根据多个条件筛选行

2 个答案: