我有一个数据集,如下所示。
df <- tribble(
~shop_id, ~id, ~key, ~date, ~status,
"1", "10", "abc", '2020-05-04', 'good',
"1", "10", "def", '2020-05-03', 'normal',
"1", "10", "glm", '2020-05-03', 'bad',
"1", "20", "ksr", '2020-05-01', 'bad',
"1", "20", "tyz", '2020-05-02', 'bad',
"2", "20", "uyv", '2020-05-01', 'good',
"2", "20", "mys", '2020-05-01', 'normal',
"2", "30", "ert", '2020-05-01', 'bad',
"2", "40", "yer", '2020-05-05', 'good',
"2", "40", "tet", '2020-05-05', 'bad',
)
现在,我要使用以下条件过滤数据:
将数据按shop_id
和id
分组,然后查看日期。然后,
date
时status == 'bad'
最小,则删除行。例如,由于这种情况,从数据集中删除了前三行。 (请参阅desired_df)'bad'
状态,请保留所有行。由于这种情况,所需数据集中剩下的第4和第5行。date
时各行中的status == 'bad'
相同,则将这两行保留在所需的数据集中。换句话说,当我们将shop_id和id分组后,我仅想查看状态为“坏”的日期最大时的行。但是,当两种状态的日期相同时,请保留行。
desired_df <- tribble(
~shop_id, ~id, ~key, ~date, ~status,
"1", "20", "ksr", '2020-05-01', 'bad',
"1", "20", "tyz", '2020-05-02', 'bad',
"2", "30", "ert", '2020-05-01', 'bad',
"2", "40", "yer", '2020-05-05', 'good',
"2", "40", "tet", '2020-05-05', 'bad',
)
任何帮助或帮助将不胜感激!
答案 0 :(得分:2)
一种方法是使用case_when
。
df %>%
mutate(date = ymd(date)) %>%
group_by(shop_id,id) %>%
mutate(filter = case_when(all(status != "bad") ~ FALSE,
all(status == "bad") ~ TRUE,
all(status[date == min(date)] == "bad") ~ FALSE,
any(status[date == min(date)] == "good") ~ TRUE,
TRUE ~ FALSE)) %>%
filter(filter == TRUE) %>%
dplyr::select(-filter)
# A tibble: 5 x 5
# Groups: shop_id, id [3]
shop_id id key date status
<chr> <chr> <chr> <date> <chr>
1 1 20 ksr 2020-05-01 bad
2 1 20 tyz 2020-05-02 bad
3 2 30 ert 2020-05-01 bad
4 2 40 yer 2020-05-05 good
5 2 40 tet 2020-05-05 bad