我有一个数据框,其中包含实验的开始和停止时间,我想计算每个实验的持续时间(每个实验一行)。数据框:
start_t stop_t
7:35 7:48
23:50 00:15
11:22 12:06
我创建了一个函数将时间转换为POSIX格式并计算持续时间,测试启动和停止是否跨越午夜:
TimeDiff <- function(t1,t2) {
if (as.numeric(as.POSIXct(paste("2016-01-01", t1))) > as.numeric(as.POSIXct(paste("2016-01-01", t2)))) {
t1n <- as.numeric(as.POSIXct(paste("2016-01-01", t1)))
t2n <- as.numeric(as.POSIXct(paste("2016-01-02", t2)))
}
if (as.numeric(as.POSIXct(paste("2016-01-01", t1))) < as.numeric(as.POSIXct(paste("2016-01-01", t2)))) {
t1n <- as.numeric(as.POSIXct(paste("2016-01-01", t1)))
t2n <- as.numeric(as.POSIXct(paste("2016-01-01", t2)))
}
#calculate time-difference in seconds
t2n - t1n
}
然后我想使用&#39; mutate&#39;将此函数应用于我的数据框。功能在&#39; dplyr&#39;或者申请&#39;功能,例如:
mutate(df, dur = TimeDiff(start_t, stop_t))
但结果是&#39; dur&#39; table填充了相同的值。我最终使用了一个笨重的for循环来将我的函数应用于数据帧,但是想要一个更优雅的解决方案。需要帮助!
答案 0 :(得分:0)
由于您没有日期而只有时间,因此确实存在跨越午夜的实验问题。你的函数不起作用,因为它没有矢量化,即它不会自己计算每个元素的差异。
以下作品但仍不完美:
library(tidyverse)
diff_time <- function(start, end) {
case_when(start < end ~ end - start,
start > end ~ parse_time("23:59") - start + end + parse_time("0:01")
)
}
df %>%
mutate_all(parse_time) %>%
mutate(duration = diff_time(start_t, stop_t))
#> start_t stop_t duration
#> 1 07:35:00 07:48:00 780 secs
#> 2 23:50:00 00:15:00 1500 secs
#> 3 11:22:00 12:06:00 2640 secs
如果你有日期,你可以这样做:
df %>%
mutate(duration = stop_t - start_t)
df <- read.table(text = "start_t stop_t
7:35 7:48
23:50 00:15
11:22 12:06", header = T)
答案 1 :(得分:0)
我能想到的最简单的方法是使用lubridate:
library(lubridate)
library(dplyr)
#make a fake df
df <- data.frame(start = c('7:35', '23:50', '11:22'), stop = c('7:48', '00:15', '12:06'), stringsAsFactors = FALSE)
#convert to lubridate minutes/seconds format, then subtract
df %>%
mutate(start = ms(start), stop = ms(stop)) %>%
mutate(dur= stop - start)
输出:
start stop dur
1 7M 35S 7M 48S 13S
2 23M 50S 15S -23M -35S
3 11M 22S 12M 6S 1M -16S
你的情况的问题是第二行会混淆lubridate - 它将显示23小时和几分钟,因为它将假设所有这些时间都在同一天。您应该添加一天:
library(lubridate)
library(dplyr)
#make a fake df
df <- data.frame(start = c('2017/10/08 7:35', '2017/10/08 23:50', '2017/10/08 11:22'), stop = c('2017/10/08 7:48', '2017/10/09 00:15', '2017/10/08 12:06'), stringsAsFactors = FALSE)
#convert to lubridate minutes/seconds format, then subtract
df %>%
mutate(start = ymd_hm(start), stop = ymd_hm(stop)) %>%
mutate(dur= stop - start)
输出:
start stop dur
1 2017-10-08 07:35:00 2017-10-08 07:48:00 13 mins
2 2017-10-08 23:50:00 2017-10-09 00:15:00 25 mins
3 2017-10-08 11:22:00 2017-10-08 12:06:00 44 mins
答案 2 :(得分:0)
当时间戳经过午夜时,日可以递增。我不确定是否有必要测试是否开始和停止午夜交叉。希望这有帮助!
df = data.frame(start_t = c("7:35", "23:50","11:22"), stop_t=c("7:48", "00:15", "12:06"), stringsAsFactors = F)
myfun = function(tvec1, tvec2, units_args="secs") {
tvec1_t = as.POSIXct(paste("2016-01-01", tvec1))
tvec2_t = as.POSIXct(paste("2016-01-01", tvec2))
time_diff = difftime(tvec2_t, tvec1_t, units = units_args)
return( time_diff )
}
# append new columns (base R)
df$time_diff = myfun(df$start_t, df$stop_t)
df$cross = ifelse(df$time_diff < 0, 1, 0)
输出:
start_t stop_t time_diff cross
1 7:35 7:48 780 secs 0
2 23:50 00:15 -84900 secs 1
3 11:22 12:06 2640 secs 0