获取R中每天的第一个和最后一个值

时间:2018-06-05 12:48:48

标签: r dplyr tidyverse lubridate

我有一个数据框,显示每列在线时间的两列。

我首先将时间与日期分开,使用:

a1 <- dmy_hm(df$V2)
d1 <- data.frame(Date= format(a1, '%d/%m/%Y'), Time=format(a1, '%H:%M:%S'))

        Date     Time
31   04/06/2018 17:51:00
32   04/06/2018 17:50:00
33   04/06/2018 17:33:00
34   04/06/2018 17:33:00
35   04/06/2018 17:29:00
36   04/06/2018 17:29:00
37   04/06/2018 17:06:00
38   04/06/2018 17:06:00
39   04/06/2018 17:01:00
40   04/06/2018 17:01:00
41   04/06/2018 16:49:00
42   04/06/2018 16:49:00
43   04/06/2018 16:43:00
44   04/06/2018 16:43:00
45   04/06/2018 16:38:00
46   04/06/2018 16:38:00
47   04/06/2018 16:22:00
48   04/06/2018 16:22:00
49   04/06/2018 16:21:00
50   04/06/2018 16:21:00
51   04/06/2018 16:14:00
52   04/06/2018 16:14:00
53   04/06/2018 15:57:00
54   04/06/2018 15:57:00
89   04/06/2018 12:05:00
90   04/06/2018 12:05:00
91   04/06/2018 12:05:00
92   04/06/2018 12:05:00
93   04/06/2018 12:05:00
94   04/06/2018 12:05:00
100  04/06/2018 12:05:00
101  04/06/2018 12:05:00

如何获得每天的第一次和最后一次?

d1 %>% 
  group_by(Date) %>% 
  summarise(Min = min(Time), Max= max(Time))

但是出现此错误消息:

Error in summarise_impl(.data, dots) : 
  Evaluation error: <U+0091>min<U+0092> not meaningful for factors.

2 个答案:

答案 0 :(得分:2)

您可以对数据进行排序,并使用firstlast代替minmax

library(dplyr)
d1 %>% 
  arrange(Time) %>%
  group_by(Date) %>% 
  summarise(Min = first(Time), Max= last(Time))

# # A tibble: 1 x 3
#           Date      Min      Max
#         <fctr>   <fctr>   <fctr>
#   1 04/06/2018 12:05:00 17:51:00

或者,您可以在stringsAsFactors = FALSE来电中使用data.frameminmaxcharacter合作,他们只是无法使用无序工作factors

d1 <- data.frame(Date= format(a1, '%d/%m/%Y'), Time=format(a1, '%H:%M:%S'),stringsAsFactors = FALSE)

library(dplyr)
d1 %>% 
  group_by(Date) %>% 
  summarise(Min = min(Time), Max= max(Time))

# # A tibble: 1 x 3
#           Date      Min      Max
#         <fctr>   <fctr>   <fctr>
#   1 04/06/2018 12:05:00 17:51:00

数据

datetimes <- c(
'04/06/2018 17:51:00',
'04/06/2018 17:50:00',
'04/06/2018 17:33:00',
'04/06/2018 17:33:00',
'04/06/2018 17:29:00',
'04/06/2018 17:29:00',
'04/06/2018 17:06:00',
'04/06/2018 17:06:00',
'04/06/2018 17:01:00',
'04/06/2018 17:01:00',
'04/06/2018 16:49:00',
'04/06/2018 16:49:00',
'04/06/2018 16:43:00',
'04/06/2018 16:43:00',
'04/06/2018 16:38:00',
'04/06/2018 16:38:00',
'04/06/2018 16:22:00',
'04/06/2018 16:22:00',
'04/06/2018 16:21:00',
'04/06/2018 16:21:00',
'04/06/2018 16:14:00',
'04/06/2018 16:14:00',
'04/06/2018 15:57:00',
'04/06/2018 15:57:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00')

library(lubridate)
a1 <- dmy_hms(datetimes)
d1 <- data.frame(Date= format(a1, '%d/%m/%Y'), Time=format(a1, '%H:%M:%S'))

答案 1 :(得分:1)

将Mudskipper的解决方案转换为快速简洁的数据。表:

setDT(d1)
d1[order(Time), .(Min = Time[1], Max = Time[.N]), Date]
         Date      Min      Max
1: 04/06/2018 12:05:00 17:51:00

为什么不同时与base-R比较:

aggregate(Time ~ Date, d1, function(x) c(Min = min(x), Max = max(x)))
        Date Time.Min Time.Max
1 04/06/2018 12:05:00 17:51:00