我想留下连接面板数据,因为缺少一些观察结果。但是,我无法做到这一点并保留面板结构:
数据:
# package I'm using
library(dplyr)
date <- as.Date(as.character(c("2015-02-13",
"2015-02-14",
"2015-02-16",
"2015-02-17",
"2015-02-14",
"2015-02-16",
"2015-02-13",
"2015-02-14",
"2015-02-17")))
b <-c("John","John","John","John","Michael","Michael","Thomas","Thomas","Thomas")
c <- c(20,30,26,20,30,40,5,10,4)
d <- c(11,2233,12,2,22,13,23,23,100)
# put together
df <- data.frame(b, dates,c,d)
df
b dates c d
#1 John 2015-02-13 20 11
#2 John 2015-02-14 30 2233
#3 John 2015-02-16 26 12
#4 John 2015-02-17 20 2
#5 Michael 2015-02-14 30 22
#6 Michael 2015-02-16 40 13
#7 Thomas 2015-02-13 5 23
#8 Thomas 2015-02-14 10 23
#9 Thomas 2015-02-17 4 100
我尝试的是创建一个完整的日期向量并离开连接:
date<-as.data.frame(seq(as.Date("2015-02-13"),as.Date("2015-02-17"),by="days"))
# rename seq. to date:
names(date)[names(date)=="seq(as.Date(\"2015-02-13\"), as.Date(\"2015-02-17\"), by = \"days\")"] <- "date"
# and left join:
t <- left_join(date,df,by=c("date"="dates"))
t
date b c d
#1 2015-02-13 John 20 11
#2 2015-02-13 Thomas 5 23
#3 2015-02-14 John 30 2233
#4 2015-02-14 Michael 30 22
#5 2015-02-14 Thomas 10 23
#6 2015-02-15 <NA> NA NA
#7 2015-02-16 John 26 12
#8 2015-02-16 Michael 40 13
#9 2015-02-17 John 20 2
#10 2015-02-17 Thomas 4 100
我如何实现这样的结果:
b dates c d
#1 John 2015-02-13 20 11
#2 John 2015-02-14 30 2233
#3 John 2015-02-15 NA NA
#4 John 2015-02-16 26 12
#5 John 2015-02-17 20 2
#6 Michael 2015-02-13 NA NA
#7 Michael 2015-02-14 30 22
#8 Michael 2015-02-15 NA NA
#9 Michael 2015-02-16 40 13
#10Michael 2015-02-17 NA NA
#7 Thomas 2015-02-13 5 23
#8 Thomas 2015-02-14 10 23
#8 Thomas 2015-02-15 NA NA
#8 Thomas 2015-02-16 NA NA
#9 Thomas 2015-02-17 4 100
答案 0 :(得分:5)
我们可以使用expand.grid
library(dplyr)
expand.grid(b = unique(df$b), date = seq(min(df$date), max(df$date), by = "1 day")) %>%
left_join(., df) %>%
arrange(b, date)
# b date c d
#1 John 2015-02-13 20 11
#2 John 2015-02-14 30 2233
#3 John 2015-02-15 NA NA
#4 John 2015-02-16 26 12
#5 John 2015-02-17 20 2
#6 Michael 2015-02-13 NA NA
#7 Michael 2015-02-14 30 22
#8 Michael 2015-02-15 NA NA
#9 Michael 2015-02-16 40 13
#10 Michael 2015-02-17 NA NA
#11 Thomas 2015-02-13 5 23
#12 Thomas 2015-02-14 10 23
#13 Thomas 2015-02-15 NA NA
#14 Thomas 2015-02-16 NA NA
#15 Thomas 2015-02-17 4 100
或使用complete
tidyr
library(tidyr)
complete(df, b, date = seq(min(date), max(date), by = "1 day"))
# b date c d
# <fctr> <date> <dbl> <dbl>
#1 John 2015-02-13 20 11
#2 John 2015-02-14 30 2233
#3 John 2015-02-15 NA NA
#4 John 2015-02-16 26 12
#5 John 2015-02-17 20 2
#6 Michael 2015-02-13 NA NA
#7 Michael 2015-02-14 30 22
#8 Michael 2015-02-15 NA NA
#9 Michael 2015-02-16 40 13
#10 Michael 2015-02-17 NA NA
#11 Thomas 2015-02-13 5 23
#12 Thomas 2015-02-14 10 23
#13 Thomas 2015-02-15 NA NA
#14 Thomas 2015-02-16 NA NA
#15 Thomas 2015-02-17 4 100