数据重塑和分组

时间:2015-10-10 23:28:52

标签: r string reshape

我是R的新手,请帮忙。 我有一个5列的数据框,名称为Seasondate和V1,V2,V3,V4。 季节日期具有不同的日期格式,大约有1000个观察点,如:

January to March 
August to October 
05/01/2013 to 10/30/2013
NA
February to June 
02/15/2013 to 06/19/2013

我想将所有这些格式化为一种格式。比如将它们以月份到月份的一种格式引入所有格式。

使用字符串函数进行解析将非常受欢迎

修改1:

所有这些都与2013年相同 谢谢

2 个答案:

答案 0 :(得分:2)

使用as.Dateformat来回进行一些格式化,然后paste再次使用这些格式:

datext <- function(x) {
  dates <- as.Date(x,format="%m/%d/%Y")
  if(all(is.na(dates))) x else format(dates,"%B")
}
vapply(
  lapply(strsplit(as.character(dat$Seasondate), " to "), datext), 
  paste, collapse=" to ", FUN.VALUE=character(1)
)
#[1] "January to March"  "August to October" "May to October"    
#[4] "NA"                "February to June"  "February to June" 

答案 1 :(得分:1)

这是另一个不使用日期强制的想法,但使用了基础R的month.name向量。

## change the column to character
df$V1 <- as.character(df$V1)
## find the numeric values
g <- grepl("\\d", df$V1)
## split them, get the months, then apply 'month.name' and paste
df$V1[g] <- vapply(strsplit(df$V1[g], " to "), function(x) {
    paste(month.name[as.integer(sub("/.*", "", x))], collapse = " to ")
}, "")

导致

df
                 V1
1  January to March
2 August to October
3    May to October
4              <NA>
5  February to June
6  February to June

原始数据:

df <- structure(list(V1 = structure(c(5L, 3L, 2L, NA, 4L, 1L), .Label = c("02/15/2013 to 06/19/2013", 
"05/01/2013 to 10/30/2013", "August to October", "February to June", 
"January to March"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-6L))