示例数据:
df <- data.frame(loc.id = rep(1:2, each = 11),
x = c(35,51,68,79,86,90,92,93,95,98,100,35,51,68,79,86,90,92,92,93,94,94))
对于每个loc.id
,我想过滤掉x <= 95
。
df %>% group_by(loc.id) %>% filter(row_number() <= which.max(x >= 95))
loc.id x
<int> <dbl>
1 1 35
2 1 51
3 1 68
4 1 79
5 1 86
6 1 90
7 1 92
8 1 93
9 1 95
10 2 35
但是,第2组的问题所有值都小于95.因此我想保留所有值
对于第2组x
。但是,上述行不会这样做。
答案 0 :(得分:2)
也许是这样的?
import json
with open("path/to/your_file.txt", "r") as f: # open the file for reading
data = json.load(f) # parse it as JSON
# now you can access the data hierarchically, i.e.
print("The first coin is {} and its symbol is {}".format(data[0]["name"], data[0]["symbol"]))
# The first coin is Bitcoin and its symbol is BTC
# or if you want just a list of all names
coin_names = [d["name"] for d in data] # ['Bitcoin', 'Ethereum', ...]
请注意删除条目,其中df %>%
group_by(loc.id) %>%
mutate(n = sum(x > 95)) %>%
filter(n == 0 | (x > 0 & x > 95)) %>%
ungroup() %>%
select(-n)
## A tibble: 13 x 2
# loc.id x
# <int> <dbl>
# 1 1 98.
# 2 1 100.
# 3 2 35.
# 4 2 51.
# 5 2 68.
# 6 2 79.
# 7 2 86.
# 8 2 90.
# 9 2 92.
#10 2 92.
#11 2 93.
#12 2 94.
#13 2 94.
对应保留条目x <= 95
(不是x > 95
)。
答案 1 :(得分:0)
如果通过match
参数找不到匹配项,您可以使用TRUE
获取第一个nomatch
索引并返回组的长度:
df %>%
group_by(loc.id) %>%
filter(row_number() <= match(TRUE, x >= 95, nomatch=n()))
# A tibble: 20 x 2
# Groups: loc.id [2]
# loc.id x
# <int> <dbl>
# 1 1 35
# 2 1 51
# 3 1 68
# 4 1 79
# 5 1 86
# 6 1 90
# 7 1 92
# 8 1 93
# 9 1 95
#10 2 35
#11 2 51
#12 2 68
#13 2 79
#14 2 86
#15 2 90
#16 2 92
#17 2 92
#18 2 93
#19 2 94
#20 2 94
或将cumsum
反转为过滤条件:
df %>% group_by(loc.id) %>% filter(!lag(cumsum(x >= 95), default=FALSE))
答案 2 :(得分:0)
使用all
和dplyr
包的解决方案可以实现为:
library(dplyr)
df %>% group_by(loc.id) %>%
filter((x > 95) | all(x<=95)) # All x in group are <= 95 OR x > 95
# # Groups: loc.id [2]
# loc.id x
# <int> <dbl>
# 1 1 98.0
# 2 1 100
# 3 2 35.0
# 4 2 51.0
# 5 2 68.0
# 6 2 79.0
# 7 2 86.0
# 8 2 90.0
# 9 2 92.0
# 10 2 92.0
# 11 2 93.0
# 12 2 94.0
# 13 2 94.0