将日期范围转换为R中的日期类型

时间:2017-04-08 03:56:18

标签: r data-manipulation

这个日期范围向量包含在我的数据框中,带有类'字符'。格式取决于日期范围是否跨越不同的月份:

dput(pollingdata$dates)
c("Nov. 1-7", "Nov. 1-7", "Oct. 24-Nov. 6", "Oct. 4-Nov. 6", 
"Oct. 30-Nov. 6", "Oct. 25-31", "Oct. 7-27", "Oct. 21-Nov. 3", 
"Oct. 20-24", "Jul. 19", "Oct. 29-Nov. 4", "Oct. 28-Nov. 3", 
"Oct. 27-Nov. 2", "Oct. 20-28", "Sep. 30-Oct. 20", "Oct. 15-19", 
"Oct. 26-Nov. 1", "Oct. 25-31", "Oct. 24-30", "Oct. 18-26", 
"Oct. 10-14", "Oct. 4-9", "Sep. 23-Oct. 6", "Sep. 16-29", "Sep. 2-22", 
"Oct. 21-Nov. 2", "Oct. 17-25", "Sep. 30-Oct. 13", "Sep. 27-Oct. 3", 
"Sep. 21-26", "Sep. 14-20", "Aug. 26-Sep. 15", "Sep. 7-13", 
"Aug. 19-Sep. 8", "Aug. 31-Sep. 6", "Aug. 12-Sep. 1", "Aug. 9-Sep. 1", 
"Aug. 24-30", "Aug. 5-25", "Aug. 17-23", "Jul. 29-Aug. 18", 
"Aug. 10-16", "Jan. 12")

我想将此向量转换为我的数据框中的两个单独的列,1。startdate和2. enddate,用于范围的开头和结尾。这两列都应保存为类'日期'这将使我更容易在项目中使用数据。有谁知道这种操作的简单方法?我一直在努力。

提前致谢,

2 个答案:

答案 0 :(得分:2)

我们可以将-的向量拆分为list,用paste月份子字符串替换末尾只有数字的元素,为那些小于1的子字符串追加NA 2个元素使用(length<-)并转换为data.frame(带do.call(rbind.data.frame

lst <- lapply(strsplit(v1, "-"), function(x) {
       i1 <- grepl("^[0-9]+", x[length(x)])
         if(i1) {
            x[length(x)] <- paste(substr(x[1], 1, 4), x[length(x)])
          x} else x})
d1 <- do.call(rbind.data.frame, lapply(lst, `length<-`, max(lengths(lst))))
colnames(d1) <- c("Start_Date", "End_Date")

根据OP的帖子,我们需要转换为Date类,但Date类跟在format的{​​{1}}之后。在向量中,没有年份,不确定我们可以粘贴当前年份并转换为%Y-%m-%d类。如果这是允许的,那么

Date

答案 1 :(得分:1)

您可以使用库stringr函数“ str_split_fixed ”来拆分字段,然后处理数据。映射库字符串并按如下方式处理:

library(stringr)
    dat <- data.frame(date=c("Nov. 1-7", "Nov. 1-7", "Oct. 24-Nov. 6", "Oct. 4-Nov. 6", 
              "Oct. 30-Nov. 6", "Oct. 25-31", "Oct. 7-27", "Oct. 21-Nov. 3", 
              "Oct. 20-24", "Jul. 19", "Oct. 29-Nov. 4", "Oct. 28-Nov. 3", 
              "Oct. 27-Nov. 2", "Oct. 20-28", "Sep. 30-Oct. 20", "Oct. 15-19", 
              "Oct. 26-Nov. 1", "Oct. 25-31", "Oct. 24-30", "Oct. 18-26", 
              "Oct. 10-14", "Oct. 4-9", "Sep. 23-Oct. 6", "Sep. 16-29", "Sep. 2-22", 
              "Oct. 21-Nov. 2", "Oct. 17-25", "Sep. 30-Oct. 13", "Sep. 27-Oct. 3", 
              "Sep. 21-26", "Sep. 14-20", "Aug. 26-Sep. 15", "Sep. 7-13", 
              "Aug. 19-Sep. 8", "Aug. 31-Sep. 6", "Aug. 12-Sep. 1", "Aug. 9-Sep. 1", 
              "Aug. 24-30", "Aug. 5-25", "Aug. 17-23", "Jul. 29-Aug. 18", 
              "Aug. 10-16", "Jan. 12"))

输出处理:

#spliting with space and dash
dt <- data.frame(str_split_fixed(dat$date, "[-]|\\s",4))
names(dt) <- c("stdt1","stdt2","endt1","endt2")
##Removing dot(.) and replacing with ""
dt1 <- data.frame(sapply(dt,function(x)gsub("[.]","",x)))
dt1$stdt <- as.Date(paste0(dt1$stdt2,dt1$stdt1,"2016"),format="%d%b%Y")
dt1$endt <- ifelse(dt1$endt2=="",paste0(dt1$endt1,dt1$stdt1,"2016"),
              paste0(dt1$endt2,dt1$endt1,"2016"))

dt1$endt <-as.Date(ifelse(nchar(dt1$endt)==7,paste0(dt1$stdt2,dt1$endt),dt1$endt),"%d%b%Y")

<强>假设:

1)没有提供年份,因此我将2016年作为年份。

2)在第10行和第43行,结束日期“天”没有信息,因此我假设与开始日期相同。

<强>答案:

> dt1
   stdt1 stdt2 endt1 endt2       stdt       endt
1    Nov     1     7       2016-11-01 2016-11-07
2    Nov     1     7       2016-11-01 2016-11-07
3    Oct    24   Nov     6 2016-10-24 2016-11-06
4    Oct     4   Nov     6 2016-10-04 2016-11-06
5    Oct    30   Nov     6 2016-10-30 2016-11-06
6    Oct    25    31       2016-10-25 2016-10-31
7    Oct     7    27       2016-10-07 2016-10-27
8    Oct    21   Nov     3 2016-10-21 2016-11-03
9    Oct    20    24       2016-10-20 2016-10-24
10   Jul    19             2016-07-19 2016-07-19