我是R的新手,请帮忙。 我有一个5列的数据框,名称为Seasondate和V1,V2,V3,V4。 季节日期具有不同的日期格式,大约有1000个观察点,如:
January to March
August to October
05/01/2013 to 10/30/2013
NA
February to June
02/15/2013 to 06/19/2013
我想将所有这些格式化为一种格式。比如将它们以月份到月份的一种格式引入所有格式。
使用字符串函数进行解析将非常受欢迎
修改1:
所有这些都与2013年相同 谢谢
答案 0 :(得分:2)
使用as.Date
和format
来回进行一些格式化,然后paste
再次使用这些格式:
datext <- function(x) {
dates <- as.Date(x,format="%m/%d/%Y")
if(all(is.na(dates))) x else format(dates,"%B")
}
vapply(
lapply(strsplit(as.character(dat$Seasondate), " to "), datext),
paste, collapse=" to ", FUN.VALUE=character(1)
)
#[1] "January to March" "August to October" "May to October"
#[4] "NA" "February to June" "February to June"
答案 1 :(得分:1)
这是另一个不使用日期强制的想法,但使用了基础R的month.name
向量。
## change the column to character
df$V1 <- as.character(df$V1)
## find the numeric values
g <- grepl("\\d", df$V1)
## split them, get the months, then apply 'month.name' and paste
df$V1[g] <- vapply(strsplit(df$V1[g], " to "), function(x) {
paste(month.name[as.integer(sub("/.*", "", x))], collapse = " to ")
}, "")
导致
df
V1
1 January to March
2 August to October
3 May to October
4 <NA>
5 February to June
6 February to June
原始数据:
df <- structure(list(V1 = structure(c(5L, 3L, 2L, NA, 4L, 1L), .Label = c("02/15/2013 to 06/19/2013",
"05/01/2013 to 10/30/2013", "August to October", "February to June",
"January to March"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-6L))