希望fastPOSIXct
有效 - 但在这种情况下无效。
这是我的时间数据(没有日期) - 我需要从中获取小时数。
times <- c("9:46","11:06", "14:17", "19:53", "0:03", "3:56")
以下是来自fastPOSIXct
的错误输出:
fastPOSIXct(times, "GMT")
[1] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT"
[3] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT"
[5] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT"
如果没有正确的日期,它无法识别时间。
来自hour
的{{1}}方法data.table
解决了目的,但在大型数组上看起来很慢。
as.ITime
想知道是否有更快的方式(就像library(data.table)
hour(as.ITime(times))
# [1] 9 11 14 19 0 3
一样,但无需日期工作)。
fastPOSIXct
真的像snap一样,但是错了。
答案 0 :(得分:11)
您也可以尝试substr
:as.integer(substr(vals, start = 1, stop = nchar(vals) - 3))
在具有10e6元素的向量的基准测试中,stringi::stri_sub
最快,substr
为2。
vals <- sample(c("9:46", "11:06", "14:17", "19:53", "0:03", "3:56"), 1e6, replace = TRUE)
fun_substr <- function(vals) as.integer(substr(vals, start = 1, stop = nchar(vals) - 3))
grab.hrs <- function(vals) as.integer(sub(pattern = ":.*", replacement = "", x = vals))
fun_strtrim <- function(vals) as.integer(strtrim(vals, nchar(vals) - 3))
library(chron)
fun_chron <- function(vals) hours(times(paste0(vals, ":00")))
fun_lt <- function(vals) as.POSIXlt(vals, format="%H:%M")$hour
library(stringi)
fun_stri_sub <- function(vals) as.integer(stri_sub(vals, from = 1, to = -4))
library(microbenchmark)
microbenchmark(fun_substr(vals),
fun_stri_sub(vals),
grab.hrs(vals),
fun_strtrim(vals),
fun_lt(vals),
fun_chron(vals),
unit = "relative", times = 5)
# Unit: relative
# expr min lq mean median uq max neval
# fun_substr(vals) 2.186714 1.902074 2.015082 1.968542 1.945007 2.090236 5
# fun_stri_sub(vals) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 5
# grab.hrs(vals) 2.656630 2.397918 2.687133 2.426223 2.446902 3.263962 5
# fun_strtrim(vals) 31.177869 27.601380 26.009818 27.423562 17.902507 29.426989 5
# fun_lt(vals) 47.296929 41.122287 42.266556 40.647465 30.539030 52.710992 5
# fun_chron(vals) 5.594931 5.159192 5.961775 7.746242 5.286944 6.189742 5
答案 1 :(得分:10)
您也可以使用times
包中的chron
功能执行此操作:
library(chron)
vals <- c("9:46","11:06", "14:17", "19:53", "0:03", "3:56")
dat <- times(paste0(vals, ":00"))
hours(dat)
# [1] 9 11 14 19 0 3
如果速度很重要,您可以通过字符串操作更快地提取小时数:
grab.hrs <- function(vals) as.numeric(sub(pattern = ":.*", replacement = "",
x = vals))
grab.hrs(vals)
# [1] 9 11 14 19 0 3
times
和as.POSIXlt
(来自@ tonytonov的解决方案)似乎比as.ITime
快一点,字符串操作更快:
library(microbenchmark)
library(data.table)
microbenchmark(hours(times(paste0(vals, ":00"))),
hours(as.ITime(vals)),
as.POSIXlt(vals, format="%H:%M")$hour,
grab.hrs(vals))
# Unit: microseconds
# expr min lq median uq max neval
# hours(times(paste0(vals, ":00"))) 174.544 184.9485 193.5630 204.6950 5047.195 100
# hours(as.ITime(vals)) 665.833 678.8790 705.6445 735.0525 3030.574 100
# as.POSIXlt(vals, format = "%H:%M")$hour 158.264 169.8880 171.9670 180.1800 301.840 100
# grab.hrs(vals) 10.637 15.4540 20.0995 21.1285 55.985 100
答案 2 :(得分:6)
这是一个选择吗?这是一个base
解决方案。
as.POSIXlt(times, format="%H:%M")$hour
#[1] 9 11 14 19 0 3
答案 3 :(得分:6)
要真正加速,你也可以从字符串中删除lsat 3字符。它比使用regex
更快。
as.numeric(strtrim(times, nchar(times) - 3))
## [1] 9 11 14 19 0 3
以下是基准测试结果
Unit: microseconds
expr min lq median uq max neval
hours(times(paste0(vals, ":00"))) 200.670 212.9720 218.7960 221.8420 352.370 100
hours(as.ITime(vals)) 453.174 478.9680 487.3805 496.7885 1607.321 100
as.POSIXlt(vals, format = "%H:%M")$hour 41.278 46.4945 49.7310 51.3115 56.453 100
grab.hrs(vals) 12.352 15.4295 18.3850 20.3390 31.349 100
as.numeric(gsub("(.*):.*", "\\\\1", times)) 14.528 17.7225 20.6390 23.4530 53.683 100
as.numeric(strtrim(times, nchar(times) - 3)) 9.621 11.6605 12.7435 13.2520 147.446 100
答案 4 :(得分:4)
您可以使用stringi包中的stri_sub
功能修剪最后3个字符,如下所示:
require(stringi)
times <- c("9:46", "11:06", "14:17", "19:53", "0:03", "3:56")
stri_sub(times, from = 1, to = -4)
## [1] "9" "11" "14" "19" "0" "3"
如果from
和/或to
参数为负数,则从字符串末尾开始计数。因此,在此示例中,子字符串是从第一个字符到第四个字符,但是从字符串的结尾开始计算。
答案 5 :(得分:0)
str_sub
或substr
总是很方便。例如,以下代码适用于substr
:
times <- c("9:46", "11:06", "14:17", "19:53", "0:03", "3:56")
times1 <- str_pad(times,5,pad='0')
times1
## [1]"09:46", "11:06", "14:17", "19:53", "00:03", "03:56"
Substr(times1,1,2)
## [1] "09" "11" "14" "19" "00" "03"