按一天中的小时过滤R数据帧

时间:2019-07-27 03:07:27

标签: r lubridate

我有一个带有datetime列的数据框。我想知道一天中每小时的行数。但是,我只关心早上8点到晚上10点之间的行。

lubridate软件包要求我们使用24小时制过滤一天中的小时数。

library(tidyverse)
library(lubridate)

### Fake Data with Date-time ----
x <- seq.POSIXt(as.POSIXct('1999-01-01'), as.POSIXct('1999-02-01'), length.out=1000)

df <- data.frame(myDateTime = x)

### Get all rows between 8 AM and 10 PM (inclusive)

df %>% 
  mutate(myHour = hour(myDateTime)) %>% 
  filter(myHour >= 8, myHour <= 22) %>%  ## between 8 AM and 10 PM (both inclusive)
  count(myHour) ## number of rows

我是否可以使用10:00 PM而不是整数22

2 个答案:

答案 0 :(得分:2)

您可以使用ymd_hmhour函数进行12小时到24小时的转换。

df %>% 
  mutate(myHour = hour(myDateTime)) %>% 
  filter(myHour >= hour(ymd_hm("2000-01-01 8:00 AM")), ## hour() ignores year, month, date
         myHour <= hour(ymd_hm("2000-01-01 10:00 PM"))) %>%  ## between 8 AM and 10 PM (both inclusive)
  count(myHour)

更优雅的解决方案。

## custom function to convert 12 hour time to 24 hour time

hourOfDay_12to24 <- function(time12hrFmt){
  out <- paste("2000-01-01", time12hrFmt)
  out <- hour(ymd_hm(out))
  out
}

df %>% 
  mutate(myHour = hour(myDateTime)) %>% 
  filter(myHour >= hourOfDay_12to24("8:00 AM"),
         myHour <= hourOfDay_12to24("10:00 PM")) %>%  ## between 8 AM and 10 PM (both inclusive)
  count(myHour)

答案 1 :(得分:1)

您还可以使用R为基数

#Extract the hour 
df$hour_day <- as.numeric(format(df$myDateTime, "%H"))

#Subset data between 08:00 AM and 10:00 PM
new_df <- df[df$hour_day >= as.integer(format(as.POSIXct("08:00 AM", 
      format = "%I:%M %p"), "%H")) & as.integer(format(as.POSIXct("10:00 PM", 
      format = "%I:%M %p"), "%H")) >= df$hour_day, ]
#Count the frequency
stack(table(new_df$hour_day))

#   values ind
#1      42   8
#2      42   9
#3      41  10
#4      42  11
#5      42  12
#6      41  13
#7      42  14
#8      41  15
#9      42  16
#10     42  17
#11     41  18
#12     42  19
#13     42  20
#14     41  21
#15     42  22

这提供与tidyverse / lubridate方法相同的输出

library(tidyverse)
library(lubridate)

df %>% 
  mutate(myHour = hour(myDateTime)) %>% 
  filter(myHour >= hour(ymd_hm("2000-01-01 8:00 AM")), 
         myHour <= hour(ymd_hm("2000-01-01 10:00 PM"))) %>%  
  count(myHour)