我有以下数据集:
df
email_id date
xyz@gmail.com 23-12-2018 21:33
xyz@gmail.com 23-12-2018 21:34
xyz@gmail.com 23-12-2018 21:35
xyz@gmail.com 23-12-2018 21:36
xyz@gmail.com 23-12-2018 21:37
abc@yahoo.com 23-12-2018 21:09
abc@yahoo.com 23-12-2018 21:10
abc@yahoo.com 23-12-2018 21:11
abc@yahoo.com 23-12-2018 21:12
abc@yahoo.com 23-12-2018 21:13
lmn@outlook.com 23-12-2018 21:44
lmn@outlook.com 23-12-2018 21:45
lmn@outlook.com 23-12-2018 21:46
lmn@outlook.com 23-12-2018 21:47
我正在尝试查找具有最新时间戳的独特电子邮件。输出如下所示:
email_id date
xyz@gmail.com 23-12-2018 21:37
abc@yahoo.com 23-12-2018 21:13
lmn@outlook.com 23-12-2018 21:47
这可以使用dplyr来完成,还是我应该通过查询尝试一些sql组?需要帮助。
答案 0 :(得分:1)
使用data.table
:
DT[, date := as.POSIXct(date, "%d-%m-%Y %H:%M", tz = "")]
DT[, .SD[which.max(date)], email_id]
email_id date
1: xyz@gmail.com 2018-12-23 21:37:00
2: abc@yahoo.com 2018-12-23 21:13:00
3: lmn@outlook.com 2018-12-23 21:47:00
位置:
DT <- fread("email_id, date
xyz@gmail.com, 23-12-2018 21:33
xyz@gmail.com, 23-12-2018 21:34
xyz@gmail.com, 23-12-2018 21:35
xyz@gmail.com, 23-12-2018 21:36
xyz@gmail.com, 23-12-2018 21:37
abc@yahoo.com, 23-12-2018 21:09
abc@yahoo.com, 23-12-2018 21:10
abc@yahoo.com, 23-12-2018 21:11
abc@yahoo.com, 23-12-2018 21:12
abc@yahoo.com, 23-12-2018 21:13
lmn@outlook.com, 23-12-2018 21:44
lmn@outlook.com, 23-12-2018 21:45
lmn@outlook.com, 23-12-2018 21:46
lmn@outlook.com, 23-12-2018 21:47")