使用我加入后,保留两个数据框中的列

时间:2017-05-19 15:52:52

标签: r

我正在使用下面的代码段合并两个数据框

check_store_count<- merge(x = agg_cluster_sku_str_cnt_1, y = agg_cluster_sku_str_cnt_2, by.x = c("cluster_1","sku_1"),
                          by.y = c("cluster_2", "sku_2") , All = TRUE )

合并上述两个数据框后,生成的文件没有&#34; cluster_2&#34;,&#34; sku_2&#34;领域。我怎么能在结果中得到它们呢?

1 个答案:

答案 0 :(得分:0)

merge旨在以这种方式工作。它会找到列匹配的行,并创建新行。通过设置all = TRUE,您正在对两个数据帧进行外连接。

df1 <- data.frame(a1 = c(1,2,3),
              b1 = c("a","b","c"),
              c1 = c(4,5,6))


df2 <- data.frame(a2 = c(1,2,4),
              b2 = c("c","b","d"),
              c2 = c(7,8,9))

merge(x = df1,
  y = df2,
  by.x = c("a1", "b1"),
  by.y = c("a2", "b2"),
  all = TRUE)

#   a1 b1 c1 c2
# 1  1  a  4 NA
# 2  1  c NA  7
# 3  2  b  5  8
# 4  3  c  6 NA
# 5  4  d NA  9

您可以复制列并将重复项留在by列表中。

library(dplyr)
df2 <- df2 %>%
  mutate(a2_dup = a2,
         b2_dup = b2)

merge(x = df1,
      y = df2,
      by.x = c("a1", "b1"),
      by.y = c("a2", "b2"),
      all = TRUE)

#   a1 b1 c1 c2 a2_dup b2_dup
# 1  1  a  4 NA     NA   <NA>
# 2  1  c NA  7      1      c
# 3  2  b  5  8      2      b
# 4  3  c  6 NA     NA   <NA>
# 5  4  d NA  9      4      d