设S1为不同时间值的矢量
s1 = c("PT1H57M3S", "PT1H3M46S","PT1H33S","PT1H2M", "PT18S","PT18M9S", "PT1H39M22S")
我想分隔小时,分钟和秒的值 例如。 PT1H57M3S应该进入列中 H M S. 1 57 3 我只放了几种不同的字符串值。否则它构成数据帧列的一部分。 请在R编程中建议如何操作
答案 0 :(得分:2)
我们可以split
位于字母和数字之间的边界,然后将其转换为data.frame
并使用rbindlist
中的data.table
library(data.table)
rbindlist(
lapply(strsplit(s1, "(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])", perl = TRUE),
function(x) {
x1 <- x[-1];val <- x1[seq(1, length(x1), by = 2)]
nm <- x1[seq(2, length(x1), by = 2)]
setNames(as.data.frame.list(val), nm)}),
fill = TRUE)
# H M S
#1: 1 57 3
#2: 1 3 46
#3: 1 NA 33
#4: 1 2 NA
#5: NA NA 18
#6: NA 18 9
#7: 1 39 22
我们也可以使用tidyverse
library(tidyverse)
library(stringi)
out <- map2_df(stri_extract_all_regex(s1, "\\d+"),
stri_extract_all_regex(s1, "[HMS]"), ~ .x %>%
as.integer %>%
as.list %>%
set_names(.y) )
out
#A tibble: 7 x 3
# H M S
# <int> <int> <int>
#1 1 57 3
#2 1 3 46
#3 1 NA 33
#4 1 2 NA
#5 NA NA 18
#6 NA 18 9
#7 1 39 22
如果我们需要将NA
替换为0
out[is.na(out)] <- 0
或者,如果我们需要转换为时间类,
library(lubridate)
v1 <- parse_date_time(sub("^PT", "", s1),
order = rlang::syms(tolower(unique(gsub("[^HMS]+", "", s1)))))
tibble(Hour = hour(v1), Minute = minute(v1), Seconds = seconds(v1))
# A tibble: 7 x 3
# Hour Minute Seconds
# <int> <int> <dbl>
#1 1 57 3
#2 1 3 46
#3 1 0 33
#4 1 0 2
#5 0 0 18
#6 18 0 9
#7 1 39 22
在这里,我们从输入字符串
中以编程方式选择格式或者我们只能使用base R
v1 <- do.call(pmax, c(lapply(paste0("PT", gsub("(.)", "%\\1\\1",
unique(gsub("[^HMS]+", "", s1)))), strptime, x = s1), list(na.rm= TRUE)))
data.frame(hour = v1$hour, minute = v1$min, sec = v1$sec)
# hour minute sec
#1 1 57 3
#2 1 3 46
#3 1 0 33
#4 1 2 0
#5 0 0 18
#6 0 18 9
#7 1 39 22
答案 1 :(得分:1)
这是一个基础R解决方案:
df <- data.frame(H = s1, M = s1, S = s1, stringsAsFactors = FALSE)
df$H <- regmatches(df$H, regexec("\\d{1,2}(?=H)", df$H, perl = TRUE))
df$M <- regmatches(df$M, regexec("\\d{1,2}(?=M)", df$M, perl = TRUE))
df$S <- regmatches(df$S, regexec("\\d{1,2}(?=S)", df$S, perl = TRUE))
df[] <- lapply(df, as.integer) # Convert columns to integer data type
# Output
H M S
1 1 57 3
2 1 3 46
3 1 NA 33
4 1 2 NA
5 NA NA 18
6 NA 18 9
7 1 39 22
答案 2 :(得分:1)
更强大的解决方案是将时间分解为某种时间类,而不是将它们分成不同的变量。 hms或chron(甚至只是difftime或POSIXct)。目前,hms是一个不错的选择,因为如果你正在使用tidyverse,它会得到很好的支持。
所有这一切,说难的部分并没有真正转换,它首先解析为上述之一。这样做的一次性方法是lubridate::parse_date_time
,它解析为POSIXct,但会猜测所提供的格式,直到一个工作,这样可以节省大量的控制流代码。
s1 <- c("PT1H57M3S", "PT1H3M46S","PT1H33S","PT1H2M", "PT18S","PT18M9S", "PT1H39M22S")
hms::as.hms(
lubridate::parse_date_time(
s1,
# token orders to try, in order
orders = c('PT%HH%MM%SS', 'PT%HH%SS', 'PT%MM%SS', 'PT%SS'),
exact = TRUE, # take orders as literal strptime-style formats
truncated = 2), # allow 0-2 missing tokens on end of orders
tz = 'UTC') # parse_date_time returns POSIXct in UTC time zone
#> 01:57:03
#> 01:03:46
#> 01:00:33
#> 01:02:00
#> 00:00:18
#> 00:18:09
#> 01:39:22
答案 3 :(得分:0)
您可以使用base r:
a=sub("PT(\\d+H)?(\\d+M)?(\\d+S)?","\\1,\\2,\\3",s1)
read.csv(h=F,text=gsub("[HMS]","",a),col.names = c("H","M","S"))
H M S
1 1 57 3
2 1 3 46
3 1 NA 33
4 1 2 NA
5 NA NA 18
6 NA 18 9
7 1 39 22