R:将YouTube视频时长格式化为适当的时间(秒)

时间:2018-03-06 13:25:33

标签: r string-formatting

我有矢量(列数据),它包含R中字符串格式的youtube播放持续时间。

x <- c(PT1H8S, PT9M55S, PT13M57S, PT1M5S, PT30M12S, PT1H21M5S, PT6M48S, PT31S, PT2M)

如何摆脱PT然后以秒格式获得总持续时间?

结果向量应为c(3608, 595, 837, 65, 1812, 4865, 408, 31, 120)

示例:PT1H21M5S,形式为秒= 4865。 (计算为1H = 1*360021M = 21*605S = 5*1

3 个答案:

答案 0 :(得分:2)

我用regex命令编写了一个小应用循环,删除除秒,分钟或小时之外的所有内容,然后将所有内容转换为秒。

x <- c("PT1H8S", "PT9M55S", "PT13M57S", "PT1M5S", "PT30M12S", "PT1H21M5S", "PT6M48S")
x2 <- sapply(x, function(i){
  t <- as.numeric(gsub("^(.*)M|^(.*)H|S$", "", i))
  if(grepl("M", i)) t <- t + as.numeric(gsub("^(.*)PT|^(.*)H|M(.*)$", "",i)) * 60
  if(grepl("H", i)) t <- t + as.numeric(gsub("^(.*)PT|H(.*)$", "",i)) * 3600
  t
})
x2
   PT1H8S   PT9M55S  PT13M57S    PT1M5S  PT30M12S PT1H21M5S   PT6M48S 
 3608       595       837        65      1812      4865       408 

编辑:按要求

x <- c("PT1H8S", "PT9M55S", "PT13M57S", "PT1M5S", "PT30M12S", "PT1H21M5S", "PT6M48S", "PT31S", "PT2M")
x2 <- sapply(x, function(i){
  t <- 0
  if(grepl("S", i)) t <- t + as.numeric(gsub("^(.*)PT|^(.*)M|^(.*)H|S$", "", i))
  if(grepl("M", i)) t <- t + as.numeric(gsub("^(.*)PT|^(.*)H|M(.*)$", "",i)) * 60
  if(grepl("H", i)) t <- t + as.numeric(gsub("^(.*)PT|H(.*)$", "",i)) * 3600
  t
})
x2
   PT1H8S   PT9M55S  PT13M57S    PT1M5S  PT30M12S PT1H21M5S   PT6M48S     PT31S      PT2M 
     3608       595       837        65      1812      4865       408        31       120 

这应涵盖所有情况。如果有更多,诀窍是改变正则表达式。 ^是字符向量的开头,$是结尾。 (.*)就是一切。因此^(.*)H表示开头和H之间的所有内容。我们将其替换为空。

答案 1 :(得分:1)

这是一个 dplyrstringr 解决方案:

df %>%
  # extract hours, minutes, and seconds and convert to numeric:
  mutate(
    h = as.numeric(str_extract(x, "(?<=PT)\\d+(?=H)")),
    m = as.numeric(str_extract(x, "(?<=PT|H)\\d+(?=M)")),
    s = as.numeric(str_extract(x, "(?<=PT|H|M)\\d+(?=S)"))
  ) %>%
  # replace NA with 0:
  mutate(
    across(everything(), replace_na, 0)
  ) %>%
  # calculate time in seconds:
  mutate(sec = h*3600+m*60+s)
          x h  m  s  sec
1    PT1H8S 1  0  8 3608
2   PT9M55S 0  9 55  595
3  PT13M57S 0 13 57  837
4    PT1M5S 0  1  5   65
5  PT30M12S 0 30 12 1812
6 PT1H21M5S 1 21  5 4865
7   PT6M48S 0  6 48  408
8     PT31S 0  0 31   31
9      PT2M 0  2  0  120

数据:

df <- data.frame(x = c("PT1H8S", "PT9M55S", "PT13M57S", "PT1M5S", "PT30M12S", "PT1H21M5S", "PT6M48S", "PT31S", "PT2M"))

答案 2 :(得分:-1)

您可以使用 Lubridate 包:

library(lubridate)
 
 x <- c("PT1H8S", "PT9M55S", "PT13M57S", "PT1M5S", "PT30M12S", "PT1H21M5S", "PT6M48S")
 
 x2 <- as.numeric(duration(x))
 
 x2
[1] 3608  595  837   65 1812 4865  408