我有一个带有datetime列的数据框。我想知道一天中每小时的行数。但是,我只关心早上8点到晚上10点之间的行。
lubridate
软件包要求我们使用24小时制过滤一天中的小时数。
library(tidyverse)
library(lubridate)
### Fake Data with Date-time ----
x <- seq.POSIXt(as.POSIXct('1999-01-01'), as.POSIXct('1999-02-01'), length.out=1000)
df <- data.frame(myDateTime = x)
### Get all rows between 8 AM and 10 PM (inclusive)
df %>%
mutate(myHour = hour(myDateTime)) %>%
filter(myHour >= 8, myHour <= 22) %>% ## between 8 AM and 10 PM (both inclusive)
count(myHour) ## number of rows
我是否可以使用10:00 PM
而不是整数22
?
答案 0 :(得分:2)
您可以使用ymd_hm
和hour
函数进行12小时到24小时的转换。
df %>%
mutate(myHour = hour(myDateTime)) %>%
filter(myHour >= hour(ymd_hm("2000-01-01 8:00 AM")), ## hour() ignores year, month, date
myHour <= hour(ymd_hm("2000-01-01 10:00 PM"))) %>% ## between 8 AM and 10 PM (both inclusive)
count(myHour)
更优雅的解决方案。
## custom function to convert 12 hour time to 24 hour time
hourOfDay_12to24 <- function(time12hrFmt){
out <- paste("2000-01-01", time12hrFmt)
out <- hour(ymd_hm(out))
out
}
df %>%
mutate(myHour = hour(myDateTime)) %>%
filter(myHour >= hourOfDay_12to24("8:00 AM"),
myHour <= hourOfDay_12to24("10:00 PM")) %>% ## between 8 AM and 10 PM (both inclusive)
count(myHour)
答案 1 :(得分:1)
您还可以使用R为基数
#Extract the hour
df$hour_day <- as.numeric(format(df$myDateTime, "%H"))
#Subset data between 08:00 AM and 10:00 PM
new_df <- df[df$hour_day >= as.integer(format(as.POSIXct("08:00 AM",
format = "%I:%M %p"), "%H")) & as.integer(format(as.POSIXct("10:00 PM",
format = "%I:%M %p"), "%H")) >= df$hour_day, ]
#Count the frequency
stack(table(new_df$hour_day))
# values ind
#1 42 8
#2 42 9
#3 41 10
#4 42 11
#5 42 12
#6 41 13
#7 42 14
#8 41 15
#9 42 16
#10 42 17
#11 41 18
#12 42 19
#13 42 20
#14 41 21
#15 42 22
这提供与tidyverse
/ lubridate
方法相同的输出
library(tidyverse)
library(lubridate)
df %>%
mutate(myHour = hour(myDateTime)) %>%
filter(myHour >= hour(ymd_hm("2000-01-01 8:00 AM")),
myHour <= hour(ymd_hm("2000-01-01 10:00 PM"))) %>%
count(myHour)