我获得了以下字符向量:
"On the evening of 2017-04-23, I was too tired"
"to complete my homework that was due on 24.04.2017."
我需要搜索所有日期,并将其替换为Monthname D,YYYY格式。
我知道一般格式应该是%B%d,%Y并且我可能必须使用sub()
函数,但我不太确定如何将两者结合在一起。
当我尝试类似
时sub("[0-9]{2}.[0-9]{2}.[0-9]{4}","%B %d, %Y",x)
我得到以下结果
"On the evening of 2001-01-15, I was too tired to complete my homework that was due on %B %d, %Y."
有人可以帮我弄清楚如何将它们整合在一起吗?
我的新代码在stackoverflowers的帮助下如下:
streamlineDates(x)
{
#set pattern to dates in form of YYYY-MM-DD or DD.MM.YYYY
pattern <- "\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}"
y <- c(x)
val <- unlist(regmatches(y, gregexpr(pattern, y)))
val1 <- as.Date(val,format=c("%Y-%m-%d","%d.%m.%Y"))
val2 <- format(val1,"%B %d, %Y")
y1 <- list()
for (i in 1:length(y)){
y1[i] <- gsub(pattern,val2[i],y[i])
}
}
但是,当我只输入时:
x <- "to complete my homework that was due on 24.04.2017."
......它只返回NA。我已将问题缩小到gsub
,其中替换值,“如果NA,则结果中与匹配相对应的所有元素都将设置为NA”。因此,当仅输入最后一行时缺少第一个日期,它仅返回NA。
如何让它接受其中一个或两个日期?
答案 0 :(得分:2)
第一种方法:
使用 BASE R 解决方案(不使用任何软件包):
pattern <- "\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}"
rep <- c("On the evening of 2017-04-23, I was too tired","to complete my homework that was due on 24.04.2017.")
val <- unlist(regmatches(rep, gregexpr(pattern, rep)))
val1 <- as.Date(val,format=c("%Y-%m-%d","%d.%m.%Y"))
val2 <- format(val1,"%B %d, %Y")
val2
rep1 <- list()
for (i in 1:length(rep)){
rep1[i] <- gsub(pattern,val2[i],rep[i])
}
<强>答案:强>
do.call("c",rep1)
> do.call("c",rep1)
[1] "On the evening of April 23, 2017, I was too tired"
[2] "to complete my homework that was due on April 24, 2017."
>
第二种方法:
使用库stringr
library(stringr)
rep <- c("On the evening of 2017-04-23, I was too tired","to complete my homework that was due on 24.04.2017.")
val <- str_extract(rep,"\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}")
val1 <- as.Date(val,format=c("%Y-%m-%d","%d.%m.%Y"))
val2 <- format(val1,"%B %d, %Y")
rep1 <- str_replace_all(rep,"\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}",val2)
rep1
<强>答案:强>
> rep1
[1] "On the evening of April 23, 2017, I was too tired"
[2] "to complete my homework that was due on April 24, 2017."
>
编辑在OP稍微改变了问题之后,解决方案更通用了,但是假设月份总是在中间,而分隔符仅限于破折号( - )和点(。): 强>
pattern <- "\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}"
rep <- c("On the evening of 2017-04-23, I was too tired","to complete my homework that was due on 24.04.2017.")
val <- unlist(regmatches(rep, gregexpr(pattern, rep)))
year <- regmatches(val, gregexpr("\\d{4}", val))
month <- regmatches(val, gregexpr("(?<=[.-])\\d{1,2}(?=[.-])", val,perl=T))
date <- regmatches(val, gregexpr("(?<=[.-])\\d{2}$|^\\d{2}(?=[.-])", val,perl=T))
#Extracting year month and date , assuming month always falls in middle string
date1 <- paste0(year,"-",month,"-",date)
date1 <- as.Date(date1,"%Y-%m-%d")
val2 <- format(date1,"%B %d, %Y")
rep1 <- list()
for (i in 1:length(rep)){
rep1[i] <- gsub(pattern,val2[i],rep[i])
}
do.call("c",rep1)
答案 1 :(得分:1)
首先,您需要指定日期的所有格式。然后转换为日期,并使用格式来获得所需的输出,即
#Note that I don't use any delimiter in the formatting simply because
#I will use gsub to replace all except the numbers with '' from the string
v1 <- c('%Y%m%d', '%d%m%Y')
format(as.Date(gsub('\\D+', '', x), format = v1), "%B %d, %Y")
#[1] "April 23, 2017" "April 24, 2017"
你可以使用str_replace_all
stringr
包中的(一个相当难看的)正则表达式,即
stringr::str_replace_all(x, '\\d+-\\d+-\\d+|\\d+\\.\\d+\\.\\d+',
format(as.Date(gsub('\\D+', '', x), format = v1), "%B %d, %Y"))
#[1] "On the evening of April 23, 2017, I was too tired"
#[2] "to complete my homework that was due on April 24, 2017."