如何根据最大日期值查找唯一行

时间:2018-08-29 13:35:53

标签: r date

我有以下数据集:

df

email_id           date
xyz@gmail.com   23-12-2018 21:33
xyz@gmail.com   23-12-2018 21:34
xyz@gmail.com   23-12-2018 21:35
xyz@gmail.com   23-12-2018 21:36
xyz@gmail.com   23-12-2018 21:37
abc@yahoo.com   23-12-2018 21:09
abc@yahoo.com   23-12-2018 21:10
abc@yahoo.com   23-12-2018 21:11
abc@yahoo.com   23-12-2018 21:12
abc@yahoo.com   23-12-2018 21:13
lmn@outlook.com 23-12-2018 21:44
lmn@outlook.com 23-12-2018 21:45
lmn@outlook.com 23-12-2018 21:46
lmn@outlook.com 23-12-2018 21:47

我正在尝试查找具有最新时间戳的独特电子邮件。输出如下所示:

email_id    date
xyz@gmail.com   23-12-2018 21:37
abc@yahoo.com   23-12-2018 21:13
lmn@outlook.com 23-12-2018 21:47

这可以使用dplyr来完成,还是我应该通过查询尝试一些sql组?需要帮助。

1 个答案:

答案 0 :(得分:1)

使用data.table

DT[, date := as.POSIXct(date, "%d-%m-%Y %H:%M", tz = "")]
DT[, .SD[which.max(date)], email_id]
          email_id                date
1:   xyz@gmail.com 2018-12-23 21:37:00
2:   abc@yahoo.com 2018-12-23 21:13:00
3: lmn@outlook.com 2018-12-23 21:47:00

位置:

DT <- fread("email_id,           date
xyz@gmail.com,   23-12-2018 21:33
xyz@gmail.com,   23-12-2018 21:34
xyz@gmail.com,   23-12-2018 21:35
xyz@gmail.com,   23-12-2018 21:36
xyz@gmail.com,   23-12-2018 21:37
abc@yahoo.com,   23-12-2018 21:09
abc@yahoo.com,   23-12-2018 21:10
abc@yahoo.com,   23-12-2018 21:11
abc@yahoo.com,   23-12-2018 21:12
abc@yahoo.com,   23-12-2018 21:13
lmn@outlook.com, 23-12-2018 21:44
lmn@outlook.com, 23-12-2018 21:45
lmn@outlook.com, 23-12-2018 21:46
lmn@outlook.com, 23-12-2018 21:47")