Pandas - 2个数据帧,在第二列上添加df1的索引列到df2

时间:2016-11-09 10:26:30

标签: pandas merge match

我有2个数据帧:

df1 (sample, has more columns):

+---+----------------+--------------+-----------+
|   |     Region     | Placement ID |   Units   |
+---+----------------+--------------+-----------+
| 0 | Western Europe | 1.10872E+13  | 367628.76 |
| 1 | Western Europe | 1.10872E+13  | 367628.76 |
| 2 | Western Europe | 1.10872E+13  | 74604.63  |
+---+----------------+--------------+-----------+

df2 (sample, has more columns:

+-----------+----------------+--------------+
| Creatives | Publisher Name | Placement ID |
+-----------+----------------+--------------+
| Temenos   | Quantcast      | 1.10872E+13  |
| Temenos   | Quantcast      | 1.10872E+13  |
| Temenos   | Quantcast      | 1.10872E+13  |
+-----------+----------------+--------------+

我想要做的是在数据框2中添加一个额外的列,其中数据框1的索引列基于Placement ID。

某些展示位置数据框1或2中的Id字段可能为空,或者有错误值,如果没有匹配,或者发现错误,那么我想添加一个Missing或Error值,例如N / A ,遗漏或留空

1 个答案:

答案 0 :(得分:1)

您需要{II}的IIUC,但重复有问题,因此请先按merge删除它们,然后选择第一列添加,另一列添加(Placement ID):

print (pd.merge(df2, 
                df1.drop_duplicates('Placement ID')[['Units', 'Placement ID']], 
                how='left', 
                on='Placement ID'))


  Creatives Publisher Name  Placement ID      Units
0   Temenos      Quantcast  1.108720e+13  367628.76
1   Temenos      Quantcast  1.108720e+13  367628.76
2   Temenos      Quantcast  1.108720e+13  367628.76

如果需要添加索引需要drop_duplicates

print (pd.merge(df2, 
                df1.drop_duplicates('Placement ID')
                   .reset_index()[['level_0','Placement ID']], 
                how='left', 
                on='Placement ID'))
  Creatives Publisher Name  Placement ID  level_0
0   Temenos      Quantcast  1.108720e+13        0
1   Temenos      Quantcast  1.108720e+13        0
2   Temenos      Quantcast  1.108720e+13        0

需要删除重复项,因为merge多个行由连接键组成 - 1.108720e+13中有3个相同的值df2df1中有3行,所以得到3 x 3行如:

print (pd.merge(df2, 
                df1.reset_index()[['level_0', 'Placement ID']], 
                how='left', 
                on='Placement ID'))

  Creatives Publisher Name  Placement ID  level_0
0   Temenos      Quantcast  1.108720e+13        0
1   Temenos      Quantcast  1.108720e+13        1
2   Temenos      Quantcast  1.108720e+13        2
3   Temenos      Quantcast  1.108720e+13        0
4   Temenos      Quantcast  1.108720e+13        1
5   Temenos      Quantcast  1.108720e+13        2
6   Temenos      Quantcast  1.108720e+13        0
7   Temenos      Quantcast  1.108720e+13        1
8   Temenos      Quantcast  1.108720e+13        2