我有一个包含7个变量和数百万行的数据框。我想创建行以根据已经编码的实例将数据“回填”到特定的时间点。
实例是根据Year,ID,Var1,Var2和Number计算的。您会注意到,首次实例的日期因这些“组”而异。对于第一个实例不是2015年1月1日的组,我需要“回填”其数据直到2015年1月1日。
这是初始数据帧:
Date <- c("4/1/2015", "5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015","3/1/2015","4/1/2015","5/1/2015")
Year <- 2015
ID <- c("123456", "123456", "234567", "234567", "234567", "234567", "234567", "123456", "123456", "123456")
Var1 <- c(1,1,2,2,2,2,2,1,1,1)
Var2 <- c(10,10,10,10,10,10,10,11,11,11)
Number <- c("0001", "0001", "0001","0001","0001","0001","0001","0002","0002","0002")
Instance <- c(1,2,1,2,3,4,5,1,2,3)
df <- data.frame(Date, Year, ID, Var1, Var2, Number, Instance)
这是我的预期输出:
Date <- c("1/1/2015","2/1/2015","3/1/2015","4/1/2015", "5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015")
Year <- 2015
ID <- c("123456","123456","123456","123456", "123456", "234567", "234567", "234567", "234567", "234567", "123456","123456","123456", "123456", "123456")
Var1 <- c(1,1,1,1,1,2,2,2,2,2,1,1,1,1,1)
Var2 <- c(10,10,10,10,10,10,10,10,10,10,11,11,11,11,11)
Number <- c("0001","0001","0001","0001", "0001", "0001","0001","0001","0001","0001","0002","0002","0002","0002","0002")
Instance <- c(0,0,0,1,2,1,2,3,4,5,0,0,1,2,3)
df <- data.frame(Date, Year, ID, Var1, Var2, Number, Instance)
答案 0 :(得分:1)
按感兴趣的列分组后,选项为complete
library(tidyverse)
library(lubridate)
df %>%
mutate(Date = dmy(Date)) %>%
group_by(Year, ID, Var1, Var2, Number) %>%
complete(Date = seq(floor_date(Date, 'month')[1], max(Date),
by = '1 day'), fill = list(Instance = 0)) %>%
select(names(df))
# A tibble: 15 x 7
# Groups: Year, ID, Var1, Var2, Number [6]
# Date Year ID Var1 Var2 Number Instance
# <date> <dbl> <fct> <dbl> <dbl> <fct> <dbl>
# 1 2015-01-01 2015 123456 1 10 0001 0
# 2 2015-01-02 2015 123456 1 10 0001 0
# 3 2015-01-03 2015 123456 1 10 0001 0
# 4 2015-01-04 2015 123456 1 10 0001 1
# 5 2015-01-05 2015 123456 1 10 0001 2
# 6 2015-01-01 2015 123456 1 11 0002 0
# 7 2015-01-02 2015 123456 1 11 0002 0
# 8 2015-01-03 2015 123456 1 11 0002 1
# 9 2015-01-04 2015 123456 1 11 0002 2
#10 2015-01-05 2015 123456 1 11 0002 3
#11 2015-01-01 2015 234567 2 10 0001 1
#12 2015-01-02 2015 234567 2 10 0001 2
#13 2015-01-03 2015 234567 2 10 0001 3
#14 2015-01-04 2015 234567 2 10 0001 4
#15 2015-01-05 2015 234567 2 10 0001 5