strptime范围并制作日期列

时间:2017-11-09 18:46:05

标签: r dplyr lubridate stringr

我的日期格式如下

Date                      Value
<chr>                      <dbl>
[2014-1-24 - 2014-2-2]      1.1
[2014-2-3 - 2014-3-2]       2.2
.                           .
.                           .
.                           .

这种情况持续了很多年。我想将其转换为长格式,如下所示

Date          Value
<date>        <dbl>
2014-01-24     1.1
2014-01-25     1.1
2014-01-26     1.1
2014-01-27     1.1
2014-01-28     1.1
2014-01-29     1.1
2014-01-30     1.1
2014-01-31     1.1
2014-02-01     1.1
2014-02-02     1.1
2014-02-03     2.2
2014-02-04     2.2
2014-02-05     2.2
.               .
.               .
.               .

实现这一目标的干净方法是什么?

2 个答案:

答案 0 :(得分:1)

使用dplyrtidyr

library(dplyr); library(tidyr);

df %>% 
    mutate(Date = str_match_all(Date, '\\d{4}-\\d{1,2}-\\d{1,2}'), 
           Date = lapply(Date, function(d) seq(as.Date(d[1]), as.Date(d[2]), by='day'))) %>% 
    unnest() 

#   Value       Date
#1    1.1 2014-01-24
#2    1.1 2014-01-25
#3    1.1 2014-01-26
#4    1.1 2014-01-27
#5    1.1 2014-01-28
#6    1.1 2014-01-29
#7    1.1 2014-01-30
#8    1.1 2014-01-31
#9    1.1 2014-02-01
#10   1.1 2014-02-02
#11   2.2 2014-02-03
#12   2.2 2014-02-04
# ...

使用purrr

library(stringr); library(purrr)

# extract the start and end date from Date string
df$Date <- map(str_match_all(df$Date, '\\d{4}-\\d{1,2}-\\d{1,2}'), as.Date)

# map over rows and expand the date from range to Sequence using seq.Date
pmap_df(df, ~ data_frame(Date = seq(.x[1], .x[2], by='day'), Value = .y))

# A tibble: 38 x 2
#         Date Value
#       <date> <dbl>
# 1 2014-01-24   1.1
# 2 2014-01-25   1.1
# 3 2014-01-26   1.1
# 4 2014-01-27   1.1
# 5 2014-01-28   1.1
# 6 2014-01-29   1.1
# 7 2014-01-30   1.1
# 8 2014-01-31   1.1
# 9 2014-02-01   1.1
#10 2014-02-02   1.1
# ... with 28 more rows

答案 1 :(得分:1)

以下是使用data.tablelubridate的选项。按“值”分组(假设它是唯一的 - 如果不使用行序列),请将“日期”拆分为两列tstrsplit,并使用Date将其转换为ymd类(来自lubridate),并使用Reduce

获取日期序列
library(data.table)
library(lubridate)
setDT(df1)[, .(Date = Reduce(function(...) seq(..., by = '1 day'), 
               lapply(tstrsplit(Date, "\\s-\\s"), ymd))), Value][, .(Date, Value)]
#          Date Value
# 1: 2014-01-24   1.1
# 2: 2014-01-25   1.1
# 3: 2014-01-26   1.1
# 4: 2014-01-27   1.1
# 5: 2014-01-28   1.1
# 6: 2014-01-29   1.1
# 7: 2014-01-30   1.1
# 8: 2014-01-31   1.1
# 9: 2014-02-01   1.1
#10: 2014-02-02   1.1
#11: 2014-02-03   2.2
#12: 2014-02-04   2.2
#13: 2014-02-05   2.2
#14: 2014-02-06   2.2
# - -
# - -