Question

我正在处理时间/日期变量，并试图估计每条记录花费的时间。我正在按照两个步骤进行此分析。 (a)以所需格式修改变量 (b)计算在每个问题上花费的时间。这是我的数据集的样子：

id <-     c(1,1,1,1,1, 2,2,2,2,2)
item.id <- c(1,2,3,4,5, 1,2,3,4,5)
submit.time <-c("2019-04-09 09:50:30.340","2019-04-09 09:52:12.440","2019-04-09 09:52:15.787","2019-04-09 09:53:21.587","2019-04-09 09:53:49.047",
                "2019-04-09 09:49:45.243","2019-04-09 09:52:53.663","2019-04-09 09:53:23.293","2019-04-09 09:54:00.727","2019-04-09 09:54:52.400")
start.time <- c("04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM",
                "04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM")

data <- data.frame(id, item.id,start.time, submit.time)

> data
   id item.id                 start.time             submit.time
1   1       1 04/09/2019 09:50:02.317 AM 2019-04-09 09:50:30.340
2   1       2 04/09/2019 09:50:02.317 AM 2019-04-09 09:52:12.440
3   1       3 04/09/2019 09:50:02.317 AM 2019-04-09 09:52:15.787
4   1       4 04/09/2019 09:50:02.317 AM 2019-04-09 09:53:21.587
5   1       5 04/09/2019 09:50:02.317 AM 2019-04-09 09:53:49.047
6   2       1 04/09/2019 09:47:42.583 AM 2019-04-09 09:49:45.243
7   2       2 04/09/2019 09:47:42.583 AM 2019-04-09 09:52:53.663
8   2       3 04/09/2019 09:47:42.583 AM 2019-04-09 09:53:23.293
9   2       4 04/09/2019 09:47:42.583 AM 2019-04-09 09:54:00.727
10  2       5 04/09/2019 09:47:42.583 AM 2019-04-09 09:54:52.400

id，每个学生，item.id是问题的ID，start.time是考试的登录时间（每个学生的唯一时间），{{1} }是学生提交每个问题的答案的时间。

submit.time编辑数据：此步骤包括删除(a)并切换AM|PM的顺序，因为我想使用start.time的格式，并编辑了{ {1}}的格式。

submit.time

现在，两个定时变量看起来相同。我将这些日期和时间转换为秒。

start.time

data$start.time <- gsub(" AM| PM", "", data$start.time) # exclude AM or PM data$start.time <- gsub("/", "-", data$start.time) #replace / with - dtparts = t(as.data.frame(strsplit(data$start.time,' '))) # split date and time row.names(dtparts) = NULL data$newdate <- strptime(as.character(dtparts[,1]), "%m-%d-%Y") # switch the date order data$newdate <- as.POSIXct(data$newdate) # R was complaining about the time format-had to change here data$start.time <- paste0(data$newdate," ",dtparts[,2]) # bring the time back，在这一步中，我想在这里计算在每个问题上花费的时间。对于第一个学生的第一个问题，花费的时间应为data %>% mutate(start.time.num = as.numeric(as.POSIXct(start.time), units="secs")) %>% mutate(submit.time.num = as.numeric(as.POSIXct(submit.time), units="secs")) id item.id start.time submit.time newdate start.time.num submit.time.num 1 1 1 2019-04-09 09:50:02.317 2019-04-09 09:50:30.340 2019-04-09 1554817802 1554817830 2 1 2 2019-04-09 09:50:02.317 2019-04-09 09:52:12.440 2019-04-09 1554817802 1554817932 3 1 3 2019-04-09 09:50:02.317 2019-04-09 09:52:15.787 2019-04-09 1554817802 1554817936 4 1 4 2019-04-09 09:50:02.317 2019-04-09 09:53:21.587 2019-04-09 1554817802 1554818002 5 1 5 2019-04-09 09:50:02.317 2019-04-09 09:53:49.047 2019-04-09 1554817802 1554818029 6 2 1 2019-04-09 09:47:42.583 2019-04-09 09:49:45.243 2019-04-09 1554817663 1554817785 7 2 2 2019-04-09 09:47:42.583 2019-04-09 09:52:53.663 2019-04-09 1554817663 1554817974 8 2 3 2019-04-09 09:47:42.583 2019-04-09 09:53:23.293 2019-04-09 1554817663 1554818003 9 2 4 2019-04-09 09:47:42.583 2019-04-09 09:54:00.727 2019-04-09 1554817663 1554818041 10 2 5 2019-04-09 09:47:42.583 2019-04-09 09:54:52.400 2019-04-09 1554817663 1554818092。对于第一个学生的第二个问题，花费的时间应该是(b)之前的submit.time.num(1554817830) - start.time.num(1554817802)=28。每个学生都需要重复此过程。当到达第二名学生时，应该再次为第二名学生的第一行输入submit.time.num(1554817932)。

因此，附加列应如下所示：

对于这么长时间发布，我深表歉意，如果您对第一部分也有任何建议，请让我知道，更重要的是，对submit.time.num(1554817830)=102部分有任何建议？

谢谢

Answer 1

这可以在单个管道中更快地完成。

library(dplyr)
data %>%
  mutate(
    start.time = as.POSIXct(start.time, format = "%m/%d/%Y %H:%M:%OS"),
    submit.time = as.POSIXct(submit.time),
    time.spent = difftime(submit.time, start.time, units = "secs")
  ) %>%
  group_by(id) %>%
  mutate(
    time.spent = c(time.spent[1], diff(time.spent))
  ) %>%
  ungroup()
# # A tibble: 10 x 5
#       id item.id start.time          submit.time         time.spent  
#    <dbl>   <dbl> <dttm>              <dttm>              <drtn>      
#  1     1       1 2019-04-09 09:50:02 2019-04-09 09:50:30  28.023 secs
#  2     1       2 2019-04-09 09:50:02 2019-04-09 09:52:12 102.100 secs
#  3     1       3 2019-04-09 09:50:02 2019-04-09 09:52:15   3.347 secs
#  4     1       4 2019-04-09 09:50:02 2019-04-09 09:53:21  65.800 secs
#  5     1       5 2019-04-09 09:50:02 2019-04-09 09:53:49  27.460 secs
#  6     2       1 2019-04-09 09:47:42 2019-04-09 09:49:45 122.660 secs
#  7     2       2 2019-04-09 09:47:42 2019-04-09 09:52:53 188.420 secs
#  8     2       3 2019-04-09 09:47:42 2019-04-09 09:53:23  29.630 secs
#  9     2       4 2019-04-09 09:47:42 2019-04-09 09:54:00  37.434 secs
# 10     2       5 2019-04-09 09:47:42 2019-04-09 09:54:52  51.673 secs

使用@akrun的建议，我们可以将代码缩短一点：

data %>%
  group_by(id) %>%
  mutate(
    start.time = as.POSIXct(start.time, format = "%m/%d/%Y %H:%M:%OS"),
    submit.time = as.POSIXct(submit.time),
    time.spent = submit.time - lag(submit.time, default = first(start.time))
  ) %>%
  ungroup()

（并选择删除不再需要的列）。

用r计算日期和时间格式所花费的时间

1 个答案: