我有三个数据框架。我想加入或合并数据框,以使它们全部都基于Day(但并非所有列每天都有数据),但我也想保留所有其他列。
我准备了以下虚拟数据:
# Create three dataframes of dummy data
df = data.frame(matrix(rnorm(20), nrow=10))
df2 = data.frame(matrix(rnorm(15), nrow=5))
df3 = data.frame(matrix(rnorm(30), nrow=10))
Days = seq(1:10)
Days2 = seq(from =5, to=9)
df1_all <- data.frame(Days, df)
colnames(df1_all) <- c("Days", "Survey1", "Survey2")
df2_all <- data.frame(Days2, df2)
colnames(df2_all) <- c("Days", "Survey3", "Survey4", "Survey5")
df3_all <- data.frame(Days, df3)
colnames(df3_all) <- c("Days", "Survey6", "Survey7", "Survey8")
如何将这三个数据框组合在一起,使它们具有共同的天数列,但所有调查列仍保留?
如您所见,df1_all
和df3_all
的日子在1
至10
,但是df2_all
的日子在5
至9
。
答案 0 :(得分:3)
在基数R中,我们可以将merge
与Reduce
一起使用
Reduce(function(x, y) merge(x, y, by = "Days", all = T), list(df1_all, df2_all, df3_all))
# Days Survey1 Survey2 Survey3 Survey4 Survey5 Survey6
#1 1 -0.4968500 -1.157808548 NA NA NA 0.85023226
#2 2 -1.8060313 0.656588464 NA NA NA 0.69760871
#3 3 -0.5820759 2.548991071 NA NA NA 0.54999735
#4 4 -1.1088896 -0.034760390 NA NA NA -0.40273198
#5 5 -1.0149620 -0.669633580 0.336472797 2.0702709 -0.3170591 -0.19159377
#6 6 -0.1623095 -0.007604756 0.006892838 -0.1533984 -0.1777900 -1.19452788
#7 7 0.5630558 1.777084448 -0.455468738 -1.3907009 -0.1699941 -0.05315882
#8 8 1.6478175 -1.138607737 -0.366523933 -0.7235818 -1.3723019 0.25519600
#9 9 -0.7733534 1.367827179 0.648286568 0.2582618 -0.1737872 1.70596401
#10 10 1.6059096 1.329564791 NA NA NA 1.00151325
# Survey7 Survey8
#1 -0.49558344 -0.82599859
#2 0.35555030 0.16698928
#3 -1.13460804 -0.89626463
#4 0.87820363 0.16818539
#5 0.97291675 0.35496826
#6 2.12111711 -0.05210512
#7 0.41452353 -0.19593462
#8 -0.47471847 -0.64906975
#9 0.06599349 -1.10976723
#10 -0.50247778 0.84927420
或使用dplyr::full_join
Reduce(dplyr::full_join, list(df1_all, df2_all, df3_all))
给出相同的结果(full_join
自动标识要加入的公共列)。
答案 1 :(得分:0)
使用dplyr:
library(dplyr)
df1_all %>% full_join(df2_all) %>% full_join(df3_all)