如何在dplyr中创建日期序列?

时间:2017-12-16 12:02:58

标签: r dplyr

我有一个如下所示的数据集:

dt <- structure(list(servicerequestid = c("254475", "255470", "249438", 
"249398", "249399"), createdate = structure(c(1471592400, 1471874280, 
1470037140, 1470028740, 1470031020), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), closedate = structure(c(1473661860, 1472457480, 1470641700, 
1491918180, 1470293940), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L), .Names = c("servicerequestid", "createdate", "closedate"
))

# A tibble: 5 x 3
  servicerequestid          createdate           closedate
             <chr>              <dttm>              <dttm>
1           254475 2016-08-19 07:40:00 2016-09-12 06:31:00
2           255470 2016-08-22 13:58:00 2016-08-29 07:58:00
3           249438 2016-08-01 07:39:00 2016-08-08 07:35:00
4           249398 2016-08-01 05:19:00 2017-04-11 13:43:00
5           249399 2016-08-01 05:57:00 2016-08-04 06:59:00

每个servicerequestid都是从createdateclosedate保持打开状态的服务请求的ID。 我想转换这个数据集,使每个servicerequestid的观察次数与故障单保持打开的日期一样多,并带有相应的日期。

例如,对于servicerequestid== 255470,数据集看起来像:

# A tibble: 8 x 2
  servicerequestid       date
             <dbl>     <date>
1           255470 2016-08-22
2           255470 2016-08-23
3           255470 2016-08-24
4           255470 2016-08-25
5           255470 2016-08-26
6           255470 2016-08-27
7           255470 2016-08-28
8           255470 2016-08-29

我正在尝试以下代码,但它不起作用:

dt %>%
  mutate(seq.Date(as.Date(createdate), as.Date(closedate), by="days"))

一些背景:我正在尝试在ggplot中创建动画密度贴图,我认为一种可能的方法是创建每日观察。这样,每天我都应该可以看到打开的门票数量。

2 个答案:

答案 0 :(得分:6)

以下是一种方法:

library(tidyverse)
dt %>%
  mutate_if(~inherits(.x, "POSIXct"), as.Date) %>% # convert posix cols to date
  gather(var, date, -1) %>%                        # wide to long format 
  select(-var) %>%                                 # we don't need this 
  group_by(servicerequestid) %>%                   # for every id...
  expand(date = full_seq(date, 1)) %>%             # create the date range
  filter(servicerequestid == 255470)               # Then grab the example one
# # A tibble: 8 x 2
# # Groups: servicerequestid [1]
# servicerequestid date      
# <chr>            <date>    
# 1 255470           2016-08-22
# 2 255470           2016-08-23
# 3 255470           2016-08-24
# 4 255470           2016-08-25
# 5 255470           2016-08-26
# 6 255470           2016-08-27
# 7 255470           2016-08-28
# 8 255470           2016-08-29

答案 1 :(得分:1)

另一个tidyverse解决方案。

library(tidyverse)
dt2 <- dt %>%
  mutate_at(vars(ends_with("date")), funs(as.Date)) %>%            # Convert date time class to date class
  mutate(date = map2(createdate, closedate, seq.Date, by = 1)) %>% # Create a list column with dates
  unnest() %>%                                                     # Expand based on the list column
  select(servicerequestid, date) %>%                               # Select the desired columns
  filter(servicerequestid == 255470)                               # Filter for servicerequestid 255470
dt2
# # A tibble: 8 x 2
#   servicerequestid       date
#              <chr>     <date>
# 1           255470 2016-08-22
# 2           255470 2016-08-23
# 3           255470 2016-08-24
# 4           255470 2016-08-25
# 5           255470 2016-08-26
# 6           255470 2016-08-27
# 7           255470 2016-08-28
# 8           255470 2016-08-29