Question

我有一列时间已输入为原始文本。下面是一个示例（帖子底部的数据输入代码）：

#>   id    time
#> 1 NA    <NA>
#> 2  1 7:50 pm
#> 3  2 7:20 pm
#> 4  3 3:20 pm

我想添加指标变量，例如，指示时间是否为

晚上7点以后
晚上7点至晚上7.30之间

所以我想要的输出看起来像这样：

#>   id    time before_1930 between_1900_1930
#> 1 NA    <NA>          NA                NA
#> 2  1 7:50 pm           0                 0
#> 3  2 7:20 pm           1                 1
#> 4  3 3:20 pm           1                 0

到目前为止，我尝试使用parse_date_time阅读时间，但这增加了一个日期：

library(lubridate)
df <- df %>% mutate(time = lubridate::parse_date_time(time, '%I:%M %p'))
df

#>   id                time
#> 1 NA                <NA>
#> 2  1 0000-01-01 19:50:00
#> 3  2 0000-01-01 19:20:00
#> 4  3 0000-01-01 15:20:00

是否有一种简单的方法可以直接处理小时和分钟，然后创建我提到的虚拟变量？

数据输入代码

df <- data.frame(
          id = c(NA, 1, 2, 3),
        time = c(NA, "7:50 pm", "7:20 pm", "3:20 pm")
)

Answer 1

使用parse_date_time的输出来计算自0000-01-01午夜以来的小时数，而不是尝试将其作为日期/时间来处理。

df <- data.frame(
  id = c(NA, 1, 2, 3),
  time = c(NA, "7:50 pm", "7:20 pm", "3:20 pm")
)

library(dplyr)
library(lubridate)
df <- df %>% mutate(time = lubridate::parse_date_time(time, '%I:%M %p'), 
                    time = difftime(time, 
                                    as.POSIXct("0000-01-01", tz = "UTC"), 
                                    units = "hours"), 
                    before_1930 = as.numeric(time < 19.5),
                    between_1900_1930 = as.numeric(time > 19 & time < 19.5))
df

Answer 2

尝试这个：

library(dplyr)
library(lubridate)
data.frame(
   id = c(NA, 1, 2, 3),
   time = c(NA, "7:50 pm", "7:20 pm", "3:20 pm")
 ) %>% 
   mutate(real_time = lubridate::parse_date_time(time, '%I:%M %p'),
          is_before = case_when(
            hour(real_time) < 19  ~ "Before 19",
            hour(real_time) == 19 & minute(real_time) < 30 ~ "19:00 - 19:30",
            T ~ "After 19:30"
          ))
  id    time           real_time     is_before
1 NA    <NA>                <NA>   After 19:30
2  1 7:50 pm 0000-01-01 19:50:00   After 19:30
3  2 7:20 pm 0000-01-01 19:20:00 19:00 - 19:30
4  3 3:20 pm 0000-01-01 15:20:00     Before 19

如果时间在一定间隔内，则创建指标变量

数据输入代码

2 个答案: