使用dplyr的组的不同过滤规则

时间:2018-05-30 21:31:14

标签: r filter dplyr

示例数据:

df <- data.frame(loc.id = rep(1:2, each = 11), 
             x = c(35,51,68,79,86,90,92,93,95,98,100,35,51,68,79,86,90,92,92,93,94,94))

对于每个loc.id,我想过滤掉x <= 95

df %>% group_by(loc.id) %>% filter(row_number() <= which.max(x >= 95))

          loc.id   x
          <int> <dbl>
       1      1    35
       2      1    51
       3      1    68
       4      1    79
       5      1    86
       6      1    90
       7      1    92
       8      1    93
       9      1    95
      10      2    35

但是,第2组的问题所有值都小于95.因此我想保留所有值 对于第2组x。但是,上述行不会这样做。

3 个答案:

答案 0 :(得分:2)

也许是这样的?

import json

with open("path/to/your_file.txt", "r") as f:  # open the file for reading
    data = json.load(f)  # parse it as JSON

# now you can access the data hierarchically, i.e.
print("The first coin is {} and its symbol is {}".format(data[0]["name"], data[0]["symbol"]))
# The first coin is Bitcoin and its symbol is BTC

# or if you want just a list of all names
coin_names = [d["name"] for d in data]  # ['Bitcoin', 'Ethereum', ...] 

请注意删除条目,其中df %>% group_by(loc.id) %>% mutate(n = sum(x > 95)) %>% filter(n == 0 | (x > 0 & x > 95)) %>% ungroup() %>% select(-n) ## A tibble: 13 x 2 # loc.id x # <int> <dbl> # 1 1 98. # 2 1 100. # 3 2 35. # 4 2 51. # 5 2 68. # 6 2 79. # 7 2 86. # 8 2 90. # 9 2 92. #10 2 92. #11 2 93. #12 2 94. #13 2 94. 对应保留条目x <= 95(不是x > 95)。

答案 1 :(得分:0)

如果通过match参数找不到匹配项,您可以使用TRUE获取第一个nomatch索引并返回组的长度:

df %>% 
    group_by(loc.id) %>% 
    filter(row_number() <= match(TRUE, x >= 95, nomatch=n()))

# A tibble: 20 x 2
# Groups:   loc.id [2]
#   loc.id     x
#    <int> <dbl>
# 1      1    35
# 2      1    51
# 3      1    68
# 4      1    79
# 5      1    86
# 6      1    90
# 7      1    92
# 8      1    93
# 9      1    95
#10      2    35
#11      2    51
#12      2    68
#13      2    79
#14      2    86
#15      2    90
#16      2    92
#17      2    92
#18      2    93
#19      2    94
#20      2    94

或将cumsum反转为过滤条件:

df %>% group_by(loc.id) %>% filter(!lag(cumsum(x >= 95), default=FALSE))

答案 2 :(得分:0)

使用alldplyr包的解决方案可以实现为:

library(dplyr)
df %>% group_by(loc.id) %>%
  filter((x > 95) | all(x<=95))  # All x in group are <= 95 OR x > 95

# # Groups: loc.id [2]
# loc.id     x
# <int> <dbl>
# 1      1  98.0
# 2      1 100  
# 3      2  35.0
# 4      2  51.0
# 5      2  68.0
# 6      2  79.0
# 7      2  86.0
# 8      2  90.0
# 9      2  92.0
# 10      2  92.0
# 11      2  93.0
# 12      2  94.0
# 13      2  94.0