Question

我有一个数据集，如下所示。

df <- tribble(
  ~shop_id,  ~id,      ~key,        ~date,      ~status, 
  "1",       "10",     "abc",    '2020-05-04',   'good',
  "1",       "10",     "def",    '2020-05-03',   'normal',
  "1",       "10",     "glm",    '2020-05-03',   'bad',
  "1",       "20",     "ksr",    '2020-05-01',   'bad',
  "1",       "20",     "tyz",    '2020-05-02',   'bad',
  "2",       "20",     "uyv",    '2020-05-01',   'good',
  "2",       "20",     "mys",    '2020-05-01',   'normal',
  "2",       "30",     "ert",    '2020-05-01',   'bad',
  "2",       "40",     "yer",    '2020-05-05',   'good',
  "2",       "40",     "tet",    '2020-05-05',   'bad',
)

现在，我要使用以下条件过滤数据：

将数据按shop_id和id分组，然后查看日期。然后，

如果date时status == 'bad'最小，则删除行。例如，由于这种情况，从数据集中删除了前三行。（请参阅desired_df）
如果只有'bad'状态，请保留所有行。由于这种情况，所需数据集中剩下的第4和第5行。
如果date时各行中的status == 'bad'相同，则将这两行保留在所需的数据集中。

换句话说，当我们将shop_id和id分组后，我仅想查看状态为“坏”的日期最大时的行。但是，当两种状态的日期相同时，请保留行。


desired_df <- tribble(
  ~shop_id,  ~id,      ~key,      ~date,      ~status, 
  "1",       "20",     "ksr",   '2020-05-01',   'bad',
  "1",       "20",     "tyz",   '2020-05-02',   'bad',
  "2",       "30",     "ert",   '2020-05-01',   'bad',
  "2",       "40",     "yer",   '2020-05-05',   'good',
  "2",       "40",     "tet",   '2020-05-05',   'bad', 
)

任何帮助或帮助将不胜感激！

Answer 1

一种方法是使用case_when。

df %>%
  mutate(date = ymd(date)) %>%
  group_by(shop_id,id) %>% 
  mutate(filter = case_when(all(status != "bad") ~ FALSE,
                            all(status == "bad") ~ TRUE,
                            all(status[date == min(date)] == "bad") ~ FALSE,
                            any(status[date == min(date)] == "good") ~ TRUE,
                            TRUE ~ FALSE)) %>%
  filter(filter == TRUE) %>% 
  dplyr::select(-filter)

# A tibble: 5 x 5
# Groups:   shop_id, id [3]
  shop_id id    key   date       status
  <chr>   <chr> <chr> <date>     <chr> 
1 1       20    ksr   2020-05-01 bad   
2 1       20    tyz   2020-05-02 bad   
3 2       30    ert   2020-05-01 bad   
4 2       40    yer   2020-05-05 good  
5 2       40    tet   2020-05-05 bad

如何过滤具有多个条件的行？

1 个答案: