从字符串和文本数据中提取年份

时间:2016-02-29 21:49:27

标签: regex r lubridate stringi

我需要从具有这些性质值的向量中提取开始年份和结束年份。

 yr<- c("June 2013 – Present (2 years 9 months)", "January 2012 – June 2013 (1 year 6 months)","2006 – Present (10 years)","2002 – 2006 (4 years)")


 yr
 June 2013 – Present (2 years 9 months)
 January 2012 – June 2013 (1 year 6 months)
 2006 – Present (10 years)
 2002 – 2006 (4 years)

我期待这样的输出。有没有人有建议?

 start_yr       end_yr

2013            2016
2012            2013
2006            2016
2002            2006

2 个答案:

答案 0 :(得分:4)

x <- gsub("present", "2016", yr, ignore.case = TRUE)
x <- regmatches(x, gregexpr("\\d{4}", x))
start_yr <- sapply(x, "[[", 1)
end_yr <- sapply(x, "[[", 2)

这会将开始年份和结束年份保存在2个单独的变量中,如果您希望它们只需编辑代码并生成y $ start_yr y $ end_yr

答案 1 :(得分:0)

另一种解决方案是使用stringr

library(stringr)
x <- str_replace(yr, "Present", 2016)
DF <- as.data.frame(str_extract_all(x, "\\d{4}", simplify = T))
names(DF) <- c("start_yr", "end_yr")
DF

你会得到

      start_yr end_yr
1     2013   2016
2     2012   2013
3     2006   2016
4     2002   2006