我有一列字符串,这些字符串在其条目中分布有数月和数年:
df <- data.frame(STRINGS = c("January 2017 Blah Blah",
"February Blah Blah",
"2016 Yeah Yeah",
"March Bleck",
"Stuff"))
> df
STRINGS
1 January 2017 Blah Blah
2 February Blah Blah
3 2016 Yeah Yeah
4 March Bleck
5 Stuff
所有年份的范围从2015年到2017年。
我想输出以下内容:
STRINGS MONTH YEAR
1 January 2017 Blah Blah January 2017
2 February Blah Blah February NA
3 2016 Yeah Yeah NA 2016
4 March Bleck March NA
5 Stuff NA NA
最简单的方法是什么?
首先,我有
months <- c("January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December")
years <- c(2015, 2016, 2017)
答案 0 :(得分:3)
使用dplyr
,rebus
和stringr
的解决方案。请注意,它假定每行只有1个匹配的月份和年份。
library(dplyr)
library(rebus)
library(stringr)
df2 <- df %>%
mutate(STRINGS = as.character(STRINGS)) %>%
mutate(MONTH = str_extract(STRINGS, or1(months)),
YEAR = str_extract(STRINGS, or1(years)))
df2
STRINGS MONTH YEAR
1 January 2017 Blah Blah January 2017
2 February Blah Blah February <NA>
3 2016 Yeah Yeah <NA> 2016
4 March Bleck March <NA>
5 Stuff <NA> <NA>