在R中组合多个数据帧

时间:2018-07-05 14:02:30

标签: r dataframe

我有三个数据框架。我想加入或合并数据框,以使它们全部都基于Day(但并非所有列每天都有数据),但我也想保留所有其他列。

我准备了以下虚拟数据:

# Create three dataframes of dummy data
df = data.frame(matrix(rnorm(20), nrow=10))
df2 = data.frame(matrix(rnorm(15), nrow=5))
df3 = data.frame(matrix(rnorm(30), nrow=10))

Days = seq(1:10)
Days2 = seq(from =5, to=9)

df1_all <- data.frame(Days, df)
colnames(df1_all) <- c("Days", "Survey1", "Survey2")
df2_all <- data.frame(Days2, df2)
colnames(df2_all) <- c("Days", "Survey3", "Survey4", "Survey5")
df3_all <- data.frame(Days, df3)
colnames(df3_all) <- c("Days", "Survey6", "Survey7", "Survey8")

如何将这三个数据框组合在一起,使它们具有共同的天数列,但所有调查列仍保留?

如您所见,df1_alldf3_all的日子在110,但是df2_all的日子在59

2 个答案:

答案 0 :(得分:3)

在基数R中,我们可以将mergeReduce一起使用

Reduce(function(x, y) merge(x, y, by = "Days", all = T), list(df1_all, df2_all, df3_all))
#   Days    Survey1      Survey2      Survey3    Survey4    Survey5     Survey6
#1     1 -0.4968500 -1.157808548           NA         NA         NA  0.85023226
#2     2 -1.8060313  0.656588464           NA         NA         NA  0.69760871
#3     3 -0.5820759  2.548991071           NA         NA         NA  0.54999735
#4     4 -1.1088896 -0.034760390           NA         NA         NA -0.40273198
#5     5 -1.0149620 -0.669633580  0.336472797  2.0702709 -0.3170591 -0.19159377
#6     6 -0.1623095 -0.007604756  0.006892838 -0.1533984 -0.1777900 -1.19452788
#7     7  0.5630558  1.777084448 -0.455468738 -1.3907009 -0.1699941 -0.05315882
#8     8  1.6478175 -1.138607737 -0.366523933 -0.7235818 -1.3723019  0.25519600
#9     9 -0.7733534  1.367827179  0.648286568  0.2582618 -0.1737872  1.70596401
#10   10  1.6059096  1.329564791           NA         NA         NA  1.00151325
#       Survey7     Survey8
#1  -0.49558344 -0.82599859
#2   0.35555030  0.16698928
#3  -1.13460804 -0.89626463
#4   0.87820363  0.16818539
#5   0.97291675  0.35496826
#6   2.12111711 -0.05210512
#7   0.41452353 -0.19593462
#8  -0.47471847 -0.64906975
#9   0.06599349 -1.10976723
#10 -0.50247778  0.84927420

或使用dplyr::full_join

Reduce(dplyr::full_join, list(df1_all, df2_all, df3_all))

给出相同的结果(full_join自动标识要加入的公共列)。

答案 1 :(得分:0)

使用dplyr:

library(dplyr)
df1_all %>% full_join(df2_all) %>% full_join(df3_all)