仅当使用pandas python的3个相似值时才合并两个表

时间:2017-04-08 23:43:52

标签: python pandas dataframe

仅当3列具有相同值时,我才需要组合两个不同的数据集,例如:

DF1

    iso3_o    iso3_d    year   value1   value2
      pak       tza     2000      123      456
      lby       vnm     2000      435      148
      can       jpn     2001      983      095
      civ       pa      2001      109      265
      bol       slv     2004      019      239

DF2

     origin   target    year  value_3  value_4
      pak       tza     2000      763      987
      lby       vnm     2001      349      274
      can       jpn     2002      238      095
      chl       geo     2000      109      236
      bol       slv     2004      047      384

因此,要组合表,值必须满足以下条件:

df1['iso3_o'] == df2['origins'] AND df1['iso3_d'] == df2['target'] AND df1['year'] == df2['year']

因为我需要得到如下的组合表:

iso3_o    iso3_d    year   value1   value2   value_3   value_4
   pak       tza     2000      123     456       763       987
   lby       vnm     2000      435     148       NaN       NaN
   lby       vnm     2001      NaN     NaN       349       274
   can       jpn     2001      983     095       NaN       NaN    
   can       jpn     2002      NaN     NaN       238       095    
   civ       pa      2001      109     265       NaN       NaN
   bol       slv     2004      019     239       047       384
   chl       geo     2000      NaN     NaN       109       236

1 个答案:

答案 0 :(得分:3)

IIUC我们可以在一个DF中重命名列,这样我们就可以在两个DF中“加入”列具有相同的列名。 DataFrame.merge()将合并on the intersection of the columns by default

In [114]: df1.merge(df2.rename(columns={'origin':'iso3_o', 'target':'iso3_d'}), how='outer')
Out[114]:
  iso3_o iso3_d  year  value1  value2  value_3  value_4
0    pak    tza  2000   123.0   456.0    763.0    987.0
1    lby    vnm  2000   435.0   148.0      NaN      NaN
2    can    jpn  2001   983.0    95.0      NaN      NaN
3    civ     pa  2001   109.0   265.0      NaN      NaN
4    bol    slv  2004    19.0   239.0     47.0    384.0
5    lby    vnm  2001     NaN     NaN    349.0    274.0
6    can    jpn  2002     NaN     NaN    238.0     95.0
7    chl    geo  2000     NaN     NaN    109.0    236.0