Question

我有两个数据集，每个数据集是家庭不同变量的月度摘要。我想根据家庭ID和月份合并两者。

df1一个看起来像这样：

     hh_ids      date total
     <chr>     <chr> <dbl>
1  KELDK13  2013-8-1     1
2  KMOMB02  2013-2-1     1
3  KMOMB02  2013-5-1     2
4  KMOMB04  2013-7-1     2
5  KMOMB04  2013-9-1     1
6  KMOMB06  2013-6-1     1
7  KMOMB14  2013-8-1     1
8  KMOMB16  2013-6-1     1
9  KMOMB17 2012-10-1     1
10 KMOMB17 2012-11-1     2

并且df2的前10行看起来像：

  hh_ids      date    income consumption alcohol cleaning_materials  clothing
1  KELDK01 2012-11-1  62.70588    40.52941       0           0.000000  0.000000
2  KELDK01 2012-12-1  17.64706    42.43530       0           1.058824  7.058824
3  KELDK01 2013-01-1  91.76471    48.23529       0           0.000000  0.000000
4  KELDK01 2013-02-1  91.76470   107.52940       0           0.000000  0.000000
5  KELDK01 2013-03-1 116.47060   114.47060       0           0.000000  0.000000
6  KELDK01 2013-04-1 124.41180   118.29410       0           2.705882 17.647060
7  KELDK01 2013-05-1 137.23530   105.00000       0           1.411765  1.882353
8  KELDK01 2013-06-1 131.52940   109.54120       0           4.352942  2.941176
9  KELDK01 2013-07-1 121.52940   113.47060       0           2.352941 25.882350
10 KELDK01 2013-08-1 123.32940    86.50588       0           2.588235  2.941176

我希望将“total”列添加为df2中的列，其中包含匹配的hh_ids和date。

我尝试过以下操作：

df3<-merge(df2,df1,by=c("hh_ids","date"))

但是，我的df2有53行，而df1有更多，而得到的df3只有14行。任何建议将不胜感激！

Answer 1

如果您希望保留df2中的所有行，即使它们与df1中的任何内容不匹配，那么您可以在合并中使用all参数：

df3 <- merge(df2, df1, by=c("hh_ids","date"), all.x=TRUE)

这相当于在左侧的LEFT JOIN和右侧的df2之间的SQL中执行df1。

根据ID和日期合并数据集

1 个答案: