按组分组的完整时间序列

时间:2019-05-22 12:53:15

标签: r dplyr time-series

我有一个数据框

dat <- data.frame(c("G", "G", "G", "G"), c("G1", "G1", "G2", "G2"), c('2017-01-01', '2017-01-03', '2017-04-02', '2017-04-05'))

colnames(dat) <- c('Country', 'Place', 'date')

我希望得到以下输出:(每个(国家/地区)组的完整日期)

dat <- data.frame(c("G", "G", "G", "G", "G", "G", "G"),
                  c("G1","G1", "G1", "G2", "G2", "G2", "G2"), 
                  c('2017-01-01', '2017-01-03','2017-01-03', 
                    '2017-04-02', '2017-04-03', '2017-04-04', '2017-04-05'))

我尝试过:

dat = dat %>% group_by(Country, Place) %>% complete(date)

但不起作用。 有人可以帮我吗?

2 个答案:

答案 0 :(得分:3)

您可以这样做:

dat %>%
  mutate(date = as.Date(date)) %>%
  group_by(Country, Place) %>%
  complete(date = seq.Date(min(date), max(date) , by= "day"))


# A tibble: 7 x 3
# Groups:   Country, Place [2]
  Country Place date      
  <fct>   <fct> <date>    
1 G       G1    2017-01-01
2 G       G1    2017-01-02
3 G       G1    2017-01-03
4 G       G2    2017-04-02
5 G       G2    2017-04-03
6 G       G2    2017-04-04
7 G       G2    2017-04-05

答案 1 :(得分:2)

您也可以这样做:

library(tidyverse)

group_by(dat, Country, Place) %>% 
  expand(date = full_seq(as.Date(date), 1)) %>% 
  ungroup()

# # A tibble: 7 x 3
#   Country Place date      
#   <fct>   <fct> <date>    
# 1 G       G1    2017-01-01
# 2 G       G1    2017-01-02
# 3 G       G1    2017-01-03
# 4 G       G2    2017-04-02
# 5 G       G2    2017-04-03
# 6 G       G2    2017-04-04
# 7 G       G2    2017-04-05