熊猫映射来自具有不同列名的2个数据框的数据

时间:2019-10-29 07:37:09

标签: python pandas

我尝试映射这两个数据框,但失败了。也许是因为列名称及其值有些不同。

我想像在dfNew中那样创建一个新的数据框。

df

Employee ID Employee Name   Activity Month
A0001       John Smith      Apr-19
A0002       Will Cornor     Apr-19
A0001       John Smith      May-19
A0003       David Teo       May-19
A0001       John Smith      May-19
A0002       Will Cornor     Jun-19
A0001       John Smith      Jun-19

df2

Month       Bonus
2019-04-01  5000
2019-05-01  4000
2019-06-01  6000

dfNew

Employee ID Employee Name   Activity Month  Bonus
A0001       John Smith      Apr-19          5000
A0002       Will Cornor     Apr-19          5000
A0001       John Smith      May-19          4000
A0003       David Teo       May-19          4000
A0001       John Smith      May-19          4000
A0002       Will Cornor     Jun-19          6000
A0001       John Smith      Jun-19          6000

2 个答案:

答案 0 :(得分:4)

使用datetimes的{​​{3}}更改格式,因此尽可能使用Series.dt.strftime

s = df2.set_index(df2['Month'].dt.strftime('%b-%y'))['Bonus']
df1['Bonus'] = df1['Activity Month'].map(s)
print (df1)
  Employee     ID Employee Name Activity Month  Bonus
0    A0001   John         Smith         Apr-19   5000
1    A0002   Will        Cornor         Apr-19   5000
2    A0001   John         Smith         May-19   4000
3    A0003  David           Teo         May-19   4000
4    A0001   John         Smith         May-19   4000
5    A0002   Will        Cornor         Jun-19   6000
6    A0001   John         Smith         Jun-19   6000

或者将Series.mapDataFrame.merge一起用于删除原始列的新列:

df2['Activity Month'] = df2.pop('Month').dt.strftime('%b-%y')
df1 = df1.merge(df2, on='Activity Month', how='left')

答案 1 :(得分:0)

这是基于@jezrael建议的答案

df1['Activity Month'] = pd.to_datetime(df1['Activity Month'], format='%b-%y').dt.strftime('%b-%y')

df2['Month'] = pd.to_datetime(df2['Month'], format='%Y-%m-%d').dt.strftime('%b-%y')

df2['Activity Month'] = df2.pop('Month')
df1 = df1.merge(df2, on='Activity Month', how='left')