在字符串中查找月份和年份

时间:2017-10-19 20:35:45

标签: r regex string

我有一列字符串,这些字符串在其条目中分布有数月和数年:

df <- data.frame(STRINGS = c("January 2017 Blah Blah",
                         "February Blah Blah",
                         "2016 Yeah Yeah",
                         "March Bleck",
                         "Stuff"))

> df
                 STRINGS
1 January 2017 Blah Blah
2     February Blah Blah
3         2016 Yeah Yeah
4            March Bleck
5                  Stuff

所有年份的范围从2015年到2017年。

我想输出以下内容:

                 STRINGS           MONTH         YEAR
1 January 2017 Blah Blah         January         2017
2     February Blah Blah        February           NA
3         2016 Yeah Yeah              NA         2016
4            March Bleck           March           NA
5                  Stuff              NA           NA

最简单的方法是什么?

首先,我有

months <- c("January", "February", "March", "April", "May", "June",
              "July", "August", "September", "October", "November", "December")
years <- c(2015, 2016, 2017)

1 个答案:

答案 0 :(得分:3)

使用dplyrrebusstringr的解决方案。请注意,它假定每行只有1个匹配的月份和年份。

library(dplyr)
library(rebus)
library(stringr)

df2 <- df %>%
  mutate(STRINGS = as.character(STRINGS)) %>%
  mutate(MONTH = str_extract(STRINGS, or1(months)),
         YEAR = str_extract(STRINGS, or1(years)))
df2
                 STRINGS    MONTH YEAR
1 January 2017 Blah Blah  January 2017
2     February Blah Blah February <NA>
3         2016 Yeah Yeah     <NA> 2016
4            March Bleck    March <NA>
5                  Stuff     <NA> <NA>