我看过很多与格式化时间有关的问题,但是我没有使用特定的导入格式:
Time <- c(
"22 hours 3 minutes 22 seconds",
"170 hours 15 minutes 20 seconds",
"39 seconds",
"2 days 6 hours 44 minutes 17 seconds",
"9 hours 54 minutes 36 seconds",
"357 hours 23 minutes 28 seconds",
"464 hours 30 minutes 7 seconds",
"51 seconds",
"31 hours 39 minutes 2 seconds",
"355 hours 29 minutes 10 seconds")
有些时候只包含“秒”,其他时间只包含“分钟和秒”,“天,小时,分钟和秒”,“天和秒”等。我还需要保留NA值。如何计算这个字符向量(即添加天,小时,分钟,秒)数字总天数?
例如:
Time
8.10
19.3
0.68
2.28
48.1
0.00
0.70
0.1
3.2
13.9
谢谢!
修改
老问题,但是现在只需要一个简单的lubridate
调用即可:
(period_to_seconds(period(time)) / 86400) %>% round(2)
除了需要%>%
以获取可读性之外,这也是没有包的技巧:
Time_vec <- mapply(function(tt, to_days) {
ifelse(grepl(tt, Time), gsub(paste0("^.*?(\\d+) ", tt, ".*$"), "\\1", Time), 0) %>%
as.numeric() / to_days
},
c("day", "hour", "minute", "second"),
c(1, 24, 1440, 86400)
) %>%
apply(1, sum) %>%
round(2)
在我的实际数据中,只有一个值与lubridate
解决方案不同,0.96
与0.97
。
答案 0 :(得分:3)
再次,没有包和一点正则表达式
Time <- c(
"22 hours 3 minutes 22 seconds",
"170 hours 15 minutes 20 seconds",
"39 seconds",
"6 hours 44 minutes 17 seconds",
"9 hours 54 minutes 36 seconds",
"357 hours 23 minutes 28 seconds",
"464 hours 30 minutes 7 seconds",
"51 seconds",
"31 hours 39 minutes 2 seconds",
"355 hours 29 minutes 10 seconds")
pat <- '(?:(\\d+) hours )?(?:(\\d+) minutes )?(?:(\\d+) seconds)?'
m <- regexpr(pat, Time, perl = TRUE)
m_st <- attr(m, 'capture.start')
m_ln <- attr(m, 'capture.length')
(mm <- mapply(function(x, y) as.numeric(substr(Time, x, y)),
data.frame(m_st), data.frame(m_st + m_ln - 1)))
(dd <- setNames(data.frame(mm), c('h','m','s')))
# h m s
# 1 22 3 22
# 2 170 15 20
# 3 NA NA 39
# 4 6 44 17
# 5 9 54 36
# 6 357 23 28
# 7 464 30 7
# 8 NA NA 51
# 9 31 39 2
# 10 355 29 10
round(rowSums(dd / data.frame(h = rep(24, nrow(dd)), m = 24 * 60, s = 24 * 60 * 60),
na.rm = TRUE), 3)
# [1] 0.919 7.094 0.000 0.281 0.413 14.891 19.354 0.001 1.319 14.812
答案 1 :(得分:2)
我建议你安装stringr包。然后这样做
library(stringr)
options(digits=7)
returndays <- function(alist){
val <-length(alist)
#print(val)
hr <- vector()
min <- vector()
sec <- vector()
day <- vector()
for (i in 1:val){
myinfo <-"([1-9][0-9]{0,2}) hours"
hr[i] <-str_match(alist[i],myinfo)[,2]
myinfo2 <-"([1-9][0-9]{0,2}) minutes"
min[i] <-str_match(alist[i],myinfo2)[,2]
myinfo3 <-"([1-9][0-9]{0,2}) seconds"
sec[i] <-str_match(alist[i],myinfo3)[,2]
h <- as.numeric(hr[i])/24
m <- as.numeric(min[i])/1440
s <- as.numeric(sec[i])/86400
day[i] <- sum(h+m+s,na.rm = TRUE)
}
return(day)
}
days <-returndays(Time)
days
[1] 0.9190046 7.0939815 0.0000000 0.2807523 0.4129167 14.8912963 19.3542477 0.0000000 1.3187731
[10] 14.8119213
答案 2 :(得分:2)
lubridate
提供了可以方便地将小时,分钟,秒等转换为period()
对象的函数perdiod
,该对象可以轻松转换为秒:
period(days = 3, hours = 10, minutes = 3, seconds = 37)
## [1] "3d 10H 3M 37S"
我使用此函数转换你的字符串:
to_days <- function(hms_char) {
# split string
v <- strsplit(hms_char, " ")[[1]]
# get numbers
idx <- seq(1, by = 2, length = length(v)/2)
nums <- as.list(v[idx])
# get units and use them as names
names(nums) <- v[-idx]
# apply functions, sum and convert to days
duration <- do.call(period, nums)
days <- period_to_seconds(duration)/86400
return(days)
}
它适用于单个字符串,因此您需要使用sapply
转换完整的Time
:
sapply(Time, to_days, USE.NAMES = FALSE)
## [1] 9.190046e-01 7.093981e+00 4.513889e-04 2.807523e-01 4.129167e-01 1.489130e+01 1.935425e+01
## [8] 5.902778e-04 1.318773e+00 1.481192e+01
答案 3 :(得分:1)
lubridate
在这里很有用。 hms
会自动提取小时,分钟和秒(为您节省一些正则表达式),并time_length
转换为天。
> library(lubridate)
> time_length(hms(Time), 'day')
estimate only: convert periods to intervals for accuracy
[1] 0.9190046 7.0939815 NA 0.2807523 0.4129167 14.8912963 19.3542477 NA
[9] 1.3187731 14.8119213
但是,如果没有三个数字,则hms
无法解析,因此稍微预先擦洗会有所帮助:
> library(stringr)
> Time2 <- sapply(Time, function(x){paste(paste(rep(0, 3 - str_count(x, '[0-9]+')), collapse = ' '), x)})
> time_length(hms(Time2), 'day')
estimate only: convert periods to intervals for accuracy
[1] 9.190046e-01 7.093981e+00 4.513889e-04 2.807523e-01 4.129167e-01 1.489130e+01 1.935425e+01
[8] 5.902778e-04 1.318773e+00 1.481192e+01