我如何在R中的多个条件下使用排名函数Row_number

时间:2018-12-26 23:18:56

标签: r dplyr

这是我的虚拟数据集。

ID        Order       Case         Date_created      
123456   25800265        1     2018-06-27 07:40:23 
123456   25800265        1     2018-06-25 05:29:23
123456   25800265        0     2018-07-26 06:16:28
789454   25906588        1     2018-07-12 05:59:50
789454   25906588        0     2018-07-12 07:41:29
789454   25906588        0     2018-07-10 05:43:45
789454   25906588        0     2018-07-09 05:59:26
789454   25906588        0     2018-07-05 10:39:45
287541   32140567        0     2018-07-12 07:41:29
287541   32140567        0     2018-07-10 05:43:45
287541   32140567        0     2018-07-09 05:59:26
287541   32140567        0     2018-07-05 10:39:45

根据以下条件,每个订单仅需要一条记录。 当“案例”中的订单同时包含0和1时,返回Case = 1的记录。如果存在多个记录(其中Case = 1),则获取旧的Date_created记录。 如果订单只有Case = 0,则返回最早的Date_created日期的记录。

ID        Order       Case         Date_created        
123456   25800265        1     2018-06-25 05:29:23
789454   25906588        1     2018-07-12 05:59:50
287541   32140567        0     2018-07-05 10:39:45

在Redshift中,我可以使用以下代码来完成此操作。

select * from ( select *, ROW_NUMBER()over(partition by Order order by Case desc,Date_created) as latest_time from tbl )where latest_time=1

我如何在R中完成此操作?

1 个答案:

答案 0 :(得分:4)

去那里:

library(dplyr)

df <- data.frame(
  ID = c("123456","123456","123456","789454","789454","789454","789454","789454","287541","287541","287541","287541"),
  Order = c("25800265","25800265","25800265","25906588","25906588","25906588","25906588","25906588","32140567","32140567","32140567","32140567"),
  Case = c(1,1,0,1,0,0,0,0,0,0,0,0),
  Date_created = c("2018-06-27 07:40:23","2018-06-25 05:29:23","2018-07-26 06:16:28","2018-07-12 05:59:50","2018-07-12 07:41:29","2018-07-10 05:43:45","2018-07-09 05:59:26","2018-07-05 10:39:45","2018-07-12 07:41:29","2018-07-10 05:43:45","2018-07-09 05:59:26","2018-07-05 10:39:45"),
  stringsAsFactors = F
)

df %>% 
  mutate(Date_created = as.POSIXct(Date_created)) %>% 
  group_by(Order) %>% 
  arrange(desc(Case), Date_created) %>% 
  mutate(row = row_number()) %>% 
  ungroup() %>% 
  filter(row == 1) %>% 
  select(-row) %>% 
  arrange(Order)