使用正则表达式提取可能包含或可能不包含时间的日期

时间:2013-11-26 08:27:09

标签: regex r stringr

请考虑以下

library(stringr)

text <- c("blabla bla blabla bla 6:05, 15 July 2005, blabla bla", 
          "blabla bla bla 7:06, 3 November 2006, blabla bla",
          "blabla bla 24 November 2006, blabla bla",
          "blabla bla blabla bla bla blabla bla")

dates <- str_extract_all(text, ???)

我试图从矢量中提取所有日期,如果他们带来时间也是时间。

1 个答案:

答案 0 :(得分:1)

下次尝试展示您的尝试。以下工作,但可能有更高效的正则表达式模式

pat <- paste0("([0-9]{1,2}:[0-9]{2}, )*[0-9]{1,2} (", paste(month.name, collapse = "|"), ") [0-9]{4}")

pat
## [1] "([0-9]{1,2}:[0-9]{2}, )*[0-9]{1,2} (January|February|March|April|May|June|July|August|September|October|November|December) [0-9]{4}"


regmatches(text, gregexpr(pat, text = text))
## [[1]]
## [1] "6:05, 15 July 2005"
## 
## [[2]]
## [1] "7:06, 3 November 2006"
## 
## [[3]]
## [1] "24 November 2006"
## 
## [[4]]
## character(0)
## 


# or using stringr package

str_extract_all(text, pat)
## [[1]]
## [1] "6:05, 15 July 2005"
## 
## [[2]]
## [1] "7:06, 3 November 2006"
## 
## [[3]]
## [1] "24 November 2006"
## 
## [[4]]
## character(0)
##