R-解析字符串中包含的几个日期

时间:2018-08-17 13:20:41

标签: r date parsing

我正在寻找一种解析字符串中包含的任意格式的多个日期的方法。

my_date="6 May 2018 10$ party 2018-01-05 meeting"
my_parser(my_date)
# expected : c("2018-05-06 GMT","2018-01-05 GMT")

parsedate 程序包能够解析任何日期格式,但前提是该字符串仅包含一个日期。

library(parsedate)
parse_date("06 may 2018 10$ party",approx=T)
#"2018-05-06 GMT"
parse_date("06 may 2018 10$ party 2018-01-05 meeting",approx=T)
#"2018-01-05 GMT"

1 个答案:

答案 0 :(得分:1)

您可以将月份名称替换为月份编号,按单词拆分字符串,并将parse_date应用于每个拆分部分:

    library(parsedate)
    library(stringr)

    my_date="6 May 2018 10$ party 2018-01-05 meeting"
    month_list <- c("Jan","Feb","Mar","Apr","May","Jun", "Jul","Aug","Sep", "Oct","Nov","Dec")

    text <-my_date
     for (i in 1:12){
       text <-ifelse(grepl(month_list[i], text), gsub(month_list[i], i, text), text)
      }

    words <- str_split(text, " ")
    words[[1]][grepl("[A-Za-z]", words[[1]])] <-";"
    text <- paste(words[[1]], collapse = " ")
    str<- str_split(text, ";")
    dates<- lapply(str,parse_date)[[1]]
    dates <- dates[!is.na(dates)]