我有以下格式的日期,数字+日期和NA列,我只想保留日期部分,删除R中的其他数字可能使用sub或gsub?很高兴接受一个帮助我的答案:)
df <- data.frame(a=c(1:11), datecol=c("11 June 2018", NA, NA, "400 10 June 2017",NA,"5 05 June 2018", NA, NA, NA, NA, "25 15 May 2016"))
df.desired <- data.frame(a=c(1:11), datecol=c("11 June 2018", NA, NA, "10 June 2017",NA,"05 June 2018", NA, NA, NA, NA, "15 May 2016"))
答案 0 :(得分:4)
我们可以使用sub
来匹配1或2位数字(\\d{1,2}
)的模式,后跟空格,单词(\\w+
)表示月份,空格和最后4位数字代表&#39;年&#39;,作为一个群体捕获,并在替换中使用反对捕获的群组
sub(".*\\s+(\\d{1,2}.*\\w+\\s+\\d{4}$)", "\\1", df$datecol)
#[1] "11 June 2018" NA NA "10 June 2017" NA
#[6] "05 June 2018" NA NA NA NA
#[11] "15 May 2016"
答案 1 :(得分:2)
您也可以使用stringr
包:
stringr::str_extract(df$datecol,"\\d{1,2}\\s+[a-zA-Z]+\\s+\\d{4}")
<强>输出强>:
> stringr::str_extract(df$datecol,"\\d{1,2}\\s+[a-zA-Z]+\\s+\\d{4}")
[1] "11 June 2018" NA NA "10 June 2017"
[5] NA "05 June 2018" NA NA
[9] NA NA "15 May 2016"