将字符格式的时间列转换为R中可操作的时间格式

时间:2018-05-26 23:13:06

标签: r datetime time timestamp lubridate

我的问题是关于b栏的标准化。我需要这些数据采用一种格式,以便更容易构建图形。

plot(embed(x, 2)[, 2:1])

作为输出:

a<- c("Jackson Brice / The Shocker","Flash Thompson", "Mr. Harrington","Mac Gargan","Betty Brant", "Ann Marie Hoag","Steve Rogers / Captain America", "Pepper Potts", "Karen") 
b<- c("2:30", "2:15", "2", "1:15", "1:15", "1", ":55",":45", "v")

ab <- cbind.data.frame(a,b)

                               a    b
1    Jackson Brice / The Shocker 2:30
2                 Flash Thompson 2:15
3                 Mr. Harrington    2
4                     Mac Gargan 1:15
5                    Betty Brant 1:15
6                 Ann Marie Hoag    1
7 Steve Rogers / Captain America    1
8                   Pepper Potts  :45
9                          Karen    v

如果可能,列b的对象采用可操作的时间格式。

2 个答案:

答案 0 :(得分:1)

因此,我必须对您要做的事情做出一些假设,例如:单位和你想要用字符值做什么,但希望这个函数能给你一些工作。

随着时间的推移,最大的挑战是在从文本中解析时需要一些相当清晰的规则。由于我的结果,我不得不在函数中放置一些if语句以使其工作,但只要有可能,尽量保持时间格式尽可能一致。

library(lubridate)

formatTime <- function(x) {

    # Check for a : seperator in the text
    if(grepl(":",x, fixed = TRUE)) {

        y <- unlist(strsplit(x,":", fixed = TRUE))

        # If there is no value before the : then add "00" before the :
        if(y[1]=="") {
            z <- ms(paste("00",y[2],collapse = ":"), quiet=TRUE)
        } else {
            z <- ms(paste(y,collapse = ":"), quiet=TRUE)
        }
    } else { 

        # If there is no : then add "00" after the :
        z <- ms(paste(x,"00",collapse = ":"), quiet=TRUE)
    }

    # If it did not pare with ms, i.e. it was a character, then assign zero time "00:00"
    if(is.na(z)) z <- ms("0:00")

    # Converted to duration due to issues returning period with lapply.  
    # Make dataframe to retun units and name with lapply.
    return(data.frame(time = as.duration(z)))
}

# Convert factor variable to character
ab$b <- as.character(ab$b)

ab <- cbind(ab,rbindlist(lapply(ab$b,formatTime)))

我开始尝试使用一段时间但是它不会使用apply语句正确返回,所以我转换为持续时间。这可能与您的示例显示的不同,但它应该与图表一起使用 如果我错过了您的需求,请告诉我,我会更新答案。

答案 1 :(得分:0)

可以实现使用tidyr::separatetidyr::unite的解决方案。方法是首先将包含alphabetic的值替换为00:00:00。将3个部分分开。使用dplyr::mutate_at所有3列都将更改为00格式。最后,统一所有三列。

library(tidyverse)

ab %>% mutate_if(is.factor, as.character) %>%  #Change any factor in character
  mutate(b = ifelse(grepl("[[:alpha:]]", b), "00:00:00", b)) %>%
  mutate(b = ifelse(grepl(":", b), b, paste(b,"00",sep=":")) ) %>%
  separate(b, into = c("b1", "b2", "b3"), sep = ":", fill="left", extra = "drop") %>%
  mutate_at(vars(starts_with("b")), 
      funs(sprintf("%02d", as.numeric(ifelse(is.na(.) | . == "",0,.))))) %>%
  unite("b", starts_with("b"), sep=":")

#                                a        b
# 1    Jackson Brice / The Shocker 00:02:30
# 2                 Flash Thompson 00:02:15
# 3                 Mr. Harrington 00:02:00
# 4                     Mac Gargan 00:01:15
# 5                    Betty Brant 00:01:15
# 6                 Ann Marie Hoag 00:01:00
# 7 Steve Rogers / Captain America 00:00:55
# 8                   Pepper Potts 00:00:45
# 9                          Karen 00:00:00

数据:

a<- c("Jackson Brice / The Shocker","Flash Thompson", "Mr. Harrington","Mac Gargan","Betty Brant",
 "Ann Marie Hoag","Steve Rogers / Captain America", "Pepper Potts", "Karen") 
b<- c("2:30", "2:15", "2", "1:15", "1:15", "1", ":55",":45", "v")

ab <- cbind.data.frame(a,b