仅当3列具有相同值时,我才需要组合两个不同的数据集,例如:
DF1
iso3_o iso3_d year value1 value2
pak tza 2000 123 456
lby vnm 2000 435 148
can jpn 2001 983 095
civ pa 2001 109 265
bol slv 2004 019 239
DF2
origin target year value_3 value_4
pak tza 2000 763 987
lby vnm 2001 349 274
can jpn 2002 238 095
chl geo 2000 109 236
bol slv 2004 047 384
因此,要组合表,值必须满足以下条件:
df1['iso3_o'] == df2['origins'] AND df1['iso3_d'] == df2['target'] AND df1['year'] == df2['year']
因为我需要得到如下的组合表:
iso3_o iso3_d year value1 value2 value_3 value_4
pak tza 2000 123 456 763 987
lby vnm 2000 435 148 NaN NaN
lby vnm 2001 NaN NaN 349 274
can jpn 2001 983 095 NaN NaN
can jpn 2002 NaN NaN 238 095
civ pa 2001 109 265 NaN NaN
bol slv 2004 019 239 047 384
chl geo 2000 NaN NaN 109 236
答案 0 :(得分:3)
IIUC我们可以在一个DF中重命名列,这样我们就可以在两个DF中“加入”列具有相同的列名。 DataFrame.merge()将合并on the intersection of the columns by default
:
In [114]: df1.merge(df2.rename(columns={'origin':'iso3_o', 'target':'iso3_d'}), how='outer')
Out[114]:
iso3_o iso3_d year value1 value2 value_3 value_4
0 pak tza 2000 123.0 456.0 763.0 987.0
1 lby vnm 2000 435.0 148.0 NaN NaN
2 can jpn 2001 983.0 95.0 NaN NaN
3 civ pa 2001 109.0 265.0 NaN NaN
4 bol slv 2004 19.0 239.0 47.0 384.0
5 lby vnm 2001 NaN NaN 349.0 274.0
6 can jpn 2002 NaN NaN 238.0 95.0
7 chl geo 2000 NaN NaN 109.0 236.0