如何过滤具有多个条件的行?

时间:2020-05-05 22:40:58

标签: r dplyr tidyverse

我有一个数据集,如下所示。

df <- tribble(
  ~shop_id,  ~id,      ~key,        ~date,      ~status, 
  "1",       "10",     "abc",    '2020-05-04',   'good',
  "1",       "10",     "def",    '2020-05-03',   'normal',
  "1",       "10",     "glm",    '2020-05-03',   'bad',
  "1",       "20",     "ksr",    '2020-05-01',   'bad',
  "1",       "20",     "tyz",    '2020-05-02',   'bad',
  "2",       "20",     "uyv",    '2020-05-01',   'good',
  "2",       "20",     "mys",    '2020-05-01',   'normal',
  "2",       "30",     "ert",    '2020-05-01',   'bad',
  "2",       "40",     "yer",    '2020-05-05',   'good',
  "2",       "40",     "tet",    '2020-05-05',   'bad',
)

现在,我要使用以下条件过滤数据:

将数据按shop_idid分组,然后查看日期。然后,

  1. 如果datestatus == 'bad'最小,则删除行。例如,由于这种情况,从数据集中删除了前三行。 (请参阅desired_df)
  2. 如果只有'bad'状态,请保留所有行。由于这种情况,所需数据集中剩下的第4和第5行。
  3. 如果date时各行中的status == 'bad'相同,则将这两行保留在所需的数据集中。

换句话说,当我们将shop_id和id分组后,我仅想查看状态为“坏”的日期最大时的行。但是,当两种状态的日期相同时,请保留行。


desired_df <- tribble(
  ~shop_id,  ~id,      ~key,      ~date,      ~status, 
  "1",       "20",     "ksr",   '2020-05-01',   'bad',
  "1",       "20",     "tyz",   '2020-05-02',   'bad',
  "2",       "30",     "ert",   '2020-05-01',   'bad',
  "2",       "40",     "yer",   '2020-05-05',   'good',
  "2",       "40",     "tet",   '2020-05-05',   'bad', 
)

任何帮助或帮助将不胜感激!

1 个答案:

答案 0 :(得分:2)

一种方法是使用case_when

df %>%
  mutate(date = ymd(date)) %>%
  group_by(shop_id,id) %>% 
  mutate(filter = case_when(all(status != "bad") ~ FALSE,
                            all(status == "bad") ~ TRUE,
                            all(status[date == min(date)] == "bad") ~ FALSE,
                            any(status[date == min(date)] == "good") ~ TRUE,
                            TRUE ~ FALSE)) %>%
  filter(filter == TRUE) %>% 
  dplyr::select(-filter)

# A tibble: 5 x 5
# Groups:   shop_id, id [3]
  shop_id id    key   date       status
  <chr>   <chr> <chr> <date>     <chr> 
1 1       20    ksr   2020-05-01 bad   
2 1       20    tyz   2020-05-02 bad   
3 2       30    ert   2020-05-01 bad   
4 2       40    yer   2020-05-05 good  
5 2       40    tet   2020-05-05 bad