我在面板数据表格中有几个数据框。现在我想将这些面板数据框合并为一个面板数据。这些数据框架之间有共同点和不同点。我说明如下:
DF1:
Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
Jan-05 A 1 2 3 4 5 6
Feb-05 A 2 3 4 5 6 7
Mar-05 A 3 4 5 6 7 8
Apr-05 A 4 5 6 7 8 9
May-05 A 5 6 7 8 9 10
Jun-05 A 6 7 8 9 10 11
Jul-05 A 7 8 9 10 11 12
Aug-05 A 8 9 10 11 12 13
Sep-05 A 9 10 11 12 13 14
Oct-05 A 10 11 12 13 14 15
Nov-05 A 11 12 13 14 15 16
Dec-05 A 12 13 14 15 16 17
Jan-05 B 12 12 12 12 12 12
Feb-05 B 12 12 12 12 12 12
Mar-05 B 12 12 12 12 12 12
Apr-05 B 12 12 12 12 12 12
May-05 B 12 12 12 12 12 12
Jun-05 B 12 12 12 12 12 12
Jul-05 B 12 12 12 12 12 12
Aug-05 B 12 12 12 12 12 12
Sep-05 B 12 12 12 12 12 12
Oct-05 B 12 12 12 12 12 12
Nov-05 B 12 12 12 12 12 12
Dec-05 B 12 12 12 12 12 12
DF2:
Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
Jan-06 A 1 2 3 4 5 6
Feb-06 A 2 3 4 5 6 7
Mar-06 A 3 4 5 6 7 8
Apr-06 A 4 5 6 7 8 9
May-06 A 5 6 7 8 9 10
Jun-06 A 6 7 8 9 10 11
Jul-06 A 7 8 9 10 11 12
Aug-06 A 8 9 10 11 12 13
Sep-06 A 9 10 11 12 13 14
Oct-06 A 10 11 12 13 14 15
Nov-06 A 11 12 13 14 15 16
Dec-06 A 12 13 14 15 16 17
Jan-06 C 12 12 12 12 12 12
Feb-06 C 12 12 12 12 12 12
Mar-06 C 12 12 12 12 12 12
Apr-06 C 12 12 12 12 12 12
May-06 C 12 12 12 12 12 12
Jun-06 C 12 12 12 12 12 12
Jul-06 C 12 12 12 12 12 12
Aug-06 C 12 12 12 12 12 12
Sep-06 C 12 12 12 12 12 12
Oct-05 C 12 12 12 12 12 12
Nov-05 C 12 12 12 12 12 12
Dec-05 C 12 12 12 12 12 12
所需的输出如下,我想合并面板数据框,使每个变量长期排列,如果数据不能存在一年,则它在Beta1,Beta2等下面有NA。
Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
Jan-05 A 1 2 3 4 5 6
Feb-05 A 2 3 4 5 6 7
Mar-05 A 3 4 5 6 7 8
Apr-05 A 4 5 6 7 8 9
May-05 A 5 6 7 8 9 10
Jun-05 A 6 7 8 9 10 11
Jul-05 A 7 8 9 10 11 12
Aug-05 A 8 9 10 11 12 13
Sep-05 A 9 10 11 12 13 14
Oct-05 A 10 11 12 13 14 15
Nov-05 A 11 12 13 14 15 16
Dec-05 A 12 13 14 15 16 17
Jan-06 A 1 2 3 4 5 6
Feb-06 A 2 3 4 5 6 7
Mar-06 A 3 4 5 6 7 8
Apr-06 A 4 5 6 7 8 9
May-06 A 5 6 7 8 9 10
Jun-06 A 6 7 8 9 10 11
Jul-06 A 7 8 9 10 11 12
Aug-06 A 8 9 10 11 12 13
Sep-06 A 9 10 11 12 13 14
Oct-06 A 10 11 12 13 14 15
Nov-06 A 11 12 13 14 15 16
Dec-06 A 12 13 14 15 16 17
Jan-05 B 12 12 12 12 12 12
Feb-05 B 12 12 12 12 12 12
Mar-05 B 12 12 12 12 12 12
Apr-05 B 12 12 12 12 12 12
May-05 B 12 12 12 12 12 12
Jun-05 B 12 12 12 12 12 12
Jul-05 B 12 12 12 12 12 12
Aug-05 B 12 12 12 12 12 12
Sep-05 B 12 12 12 12 12 12
Oct-05 B 12 12 12 12 12 12
Nov-05 B 12 12 12 12 12 12
Dec-05 B 12 12 12 12 12 12
Jan-06 B NA NA NA NA NA NA
Feb-06 B NA NA NA NA NA NA
Mar-06 B NA NA NA NA NA NA
Apr-06 B NA NA NA NA NA NA
May-06 B NA NA NA NA NA NA
Jun-06 B NA NA NA NA NA NA
Jul-06 B NA NA NA NA NA NA
Aug-06 B NA NA NA NA NA NA
Sep-06 B NA NA NA NA NA NA
Oct-06 B NA NA NA NA NA NA
Nov-06 B NA NA NA NA NA NA
Dec-06 B NA NA NA NA NA NA
Jan-05 C NA NA NA NA NA NA
Feb-05 C NA NA NA NA NA NA
Mar-05 C NA NA NA NA NA NA
Apr-05 C NA NA NA NA NA NA
May-05 C NA NA NA NA NA NA
Jun-05 C NA NA NA NA NA NA
Jul-05 C NA NA NA NA NA NA
Aug-05 C NA NA NA NA NA NA
Sep-05 C NA NA NA NA NA NA
Oct-05 C NA NA NA NA NA NA
Nov-05 C NA NA NA NA NA NA
Dec-05 C NA NA NA NA NA NA
Jan-06 C 12 12 12 12 12 12
Feb-06 C 12 12 12 12 12 12
Mar-06 C 12 12 12 12 12 12
Apr-06 C 12 12 12 12 12 12
May-06 C 12 12 12 12 12 12
Jun-06 C 12 12 12 12 12 12
Jul-06 C 12 12 12 12 12 12
Aug-06 C 12 12 12 12 12 12
Sep-06 C 12 12 12 12 12 12
Oct-06 C 12 12 12 12 12 12
Nov-06 C 12 12 12 12 12 12
Dec-06 C 12 12 12 12 12 12
正如我前面提到的,我将几个数据框合并并合并它们可能会产生十万行,所以我可以解决内存和空间问题。我非常感谢你的帮助。
答案 0 :(得分:5)
有一个功能。将数据框与complete
组合在一起。然后使用variable
。它将查看library(tidyr)
df3 <- do.call(rbind.data.frame, list(df1, df2))
df3$Month <- as.character(df3$Month)
df4 <- complete(df3, Month, variable)
df4$Month <- as.yearmon(df4$Month, "%b %Y")
df5 <- df4[order(df4$variable,df4$Month),]
df5
# Source: local data frame [72 x 8]
#
# Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
# (yrmn) (fctr) (int) (int) (int) (int) (int) (int)
# 1 Jan 2005 A 1 2 3 4 5 6
# 2 Feb 2005 A 2 3 4 5 6 7
# 3 Mar 2005 A 3 4 5 6 7 8
# 4 Apr 2005 A 4 5 6 7 8 9
# 5 May 2005 A 5 6 7 8 9 10
# 6 Jun 2005 A 6 7 8 9 10 11
# 7 Jul 2005 A 7 8 9 10 11 12
# 8 Aug 2005 A 8 9 10 11 12 13
# 9 Sep 2005 A 9 10 11 12 13 14
# 10 Oct 2005 A 10 11 12 13 14 15
# .. ... ... ... ... ... ... ... ...
中的组并填写任何缺少值的组:
library(dplyr)
library(tidyr)
df3 <- bind_rows(df1, df2) %>%
complete(Month, variable)
dplyr&amp;的替代实现tidyr :
slope.isNaN
答案 1 :(得分:4)
当速度和内存成为问题时,尤其是 data.table altenative(s)有两种可能的选择:
基础R:
将数据框合并为一个:
df3 <- rbind(df1,df2)
使用Month
创建包含variable
和expand.grid
的所有可能组合的参考数据框:
ref <- expand.grid(Month = unique(df3$Month), variable = unique(df3$variable))
将它们与all.x=TRUE
合并,以确保缺少的组合填充了NA值:
merge(ref, df3, by = c("Month", "variable"), all.x = TRUE)
或(感谢@PierreLafortune):
merge(ref, df3, by=1:2, all.x = TRUE)
data.table:
将数据框与“rbindlist”&#39;绑定到一个数据框。返回&#39; data.table&#39;:
library(data.table)
DT <- rbindlist(list(df1,df2))
加入引用以确保所有组合都存在,并且缺少的组合用NA填充:
DT[CJ(Month, variable, unique = TRUE), on = c(Month="V1", variable="V2")]
一次通话中的所有内容:
DT <- rbindlist(list(df1,df2))[CJ(Month, variable, unique = TRUE), on = c(Month="V1", variable="V2")]
另一种方法是将rbindlist
包裹在setkey
中,然后使用CJ
进行扩展(交叉加入):
DT <- setkey(rbindlist(list(df1,df2)), Month, variable)[CJ(Month, variable, unique = TRUE)]