如何根据R中的其他列复制行以将数据回填到特定日期?

时间:2019-07-09 15:01:26

标签: r

我有一个包含7个变量和数百万行的数据框。我想创建行以根据已经编码的实例将数据“回填”到特定的时间点。

实例是根据Year,ID,Var1,Var2和Number计算的。您会注意到,首次实例的日期因这些“组”而异。对于第一个实例不是2015年1月1日的组,我需要“回填”其数据直到2015年1月1日。

这是初始数据帧:

Date <- c("4/1/2015", "5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015","3/1/2015","4/1/2015","5/1/2015")
Year <- 2015
ID <- c("123456", "123456", "234567", "234567", "234567", "234567", "234567", "123456", "123456", "123456")
Var1 <- c(1,1,2,2,2,2,2,1,1,1)
Var2 <- c(10,10,10,10,10,10,10,11,11,11)
Number <- c("0001", "0001", "0001","0001","0001","0001","0001","0002","0002","0002")
Instance <- c(1,2,1,2,3,4,5,1,2,3)
df <- data.frame(Date, Year, ID, Var1, Var2, Number, Instance)

这是我的预期输出:

Date <- c("1/1/2015","2/1/2015","3/1/2015","4/1/2015", "5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015")
Year <- 2015
ID <- c("123456","123456","123456","123456", "123456", "234567", "234567", "234567", "234567", "234567", "123456","123456","123456", "123456", "123456")
Var1 <- c(1,1,1,1,1,2,2,2,2,2,1,1,1,1,1)
Var2 <- c(10,10,10,10,10,10,10,10,10,10,11,11,11,11,11)
Number <- c("0001","0001","0001","0001", "0001", "0001","0001","0001","0001","0001","0002","0002","0002","0002","0002")
Instance <- c(0,0,0,1,2,1,2,3,4,5,0,0,1,2,3)
df <- data.frame(Date, Year, ID, Var1, Var2, Number, Instance)

1 个答案:

答案 0 :(得分:1)

按感兴趣的列分组后,选项为complete

library(tidyverse)
library(lubridate)
df %>% 
  mutate(Date = dmy(Date)) %>% 
  group_by(Year, ID, Var1, Var2, Number) %>% 
  complete(Date = seq(floor_date(Date, 'month')[1], max(Date), 
        by = '1 day'), fill = list(Instance = 0)) %>%
  select(names(df))
# A tibble: 15 x 7
# Groups:   Year, ID, Var1, Var2, Number [6]
#   Date        Year ID      Var1  Var2 Number Instance
#   <date>     <dbl> <fct>  <dbl> <dbl> <fct>     <dbl>
# 1 2015-01-01  2015 123456     1    10 0001          0
# 2 2015-01-02  2015 123456     1    10 0001          0
# 3 2015-01-03  2015 123456     1    10 0001          0
# 4 2015-01-04  2015 123456     1    10 0001          1
# 5 2015-01-05  2015 123456     1    10 0001          2
# 6 2015-01-01  2015 123456     1    11 0002          0
# 7 2015-01-02  2015 123456     1    11 0002          0
# 8 2015-01-03  2015 123456     1    11 0002          1
# 9 2015-01-04  2015 123456     1    11 0002          2
#10 2015-01-05  2015 123456     1    11 0002          3
#11 2015-01-01  2015 234567     2    10 0001          1
#12 2015-01-02  2015 234567     2    10 0001          2
#13 2015-01-03  2015 234567     2    10 0001          3
#14 2015-01-04  2015 234567     2    10 0001          4
#15 2015-01-05  2015 234567     2    10 0001          5