我有两个pandas数据框:
import pandas as pd
df1 = pd.DataFrame({'Counterparty':['Bank','Client','Bank','Bank'],
'Maturity':[200, 400, 200, 400],
'Amount':[100, 100, 100, 100],
'Factor':[0,0,0,0]})
df2 = pd.DataFrame({'Counterparty':['Bank','Client','Client'],
'Maturity_Condition':['*', '<50', '>=50'],
'Factor':[1,0.5,0.7]})
根据df2数据帧中设置的条件,我希望填充df1中的因子。如果有'*',则应忽略该条件。因此,根据df2中的数据,如果交易对手是银行,则因子总是1(非成熟度)。但是,如果交易对手是客户,则根据到期日,因子应为0.5或0.7。对于上面的例子,我想实现:
df3=pd.DataFrame({'Counterparty':['Bank','Client','Bank','Bank'],
'Maturity':[200, 400, 200, 400],
'Amount':[100, 100, 100, 100],
'Factor':[1,0.7,1,1]})
除了使用布尔掩码和复杂的if语句列表外,是否有人有更优雅的方法来实现上述目标?
答案 0 :(得分:1)
您可以尝试合并数据框,然后将条件应用于行并保留符合条件的条件:
>>> df_merged = df1.merge(df2,how='left',on=['Counterparty']
).merge(df2,how='left',on=['Counterparty'])
>>> df_merged
Amount Counterparty Maturity Factor Maturity_Condition
0 100 Bank 200 1.0 *
1 100 Client 400 0.5 <50
2 100 Client 400 0.7 >=50
3 100 Bank 200 1.0 *
4 100 Bank 400 1.0 *
让*
条件替换为始终为True
的条件:
>>> df_merged['Maturity'].astype(str) +
df_merged['Maturity_Condition'].replace('*','!=np.nan')
0 200!=np.nan
1 400<50
2 400>=50
3 200!=np.nan
4 400!=np.nan
创建符合条件的面具:
>>> mask = (df_merged['Maturity'].astype(str) +
df_merged['Maturity_Condition'].replace('*','!=np.nan')
).apply(eval)
>>> mask =
0 True
1 False
2 True
3 True
4 True
最后,应用蒙版选择满足条件的行并选择列:
>>> df_merged[mask][['Counterparty','Maturity','Amount','Factor']]
Amount Counterparty Factor Maturity
0 100 Bank 1.0 200
1 100 Client 0.7 400
2 100 Bank 1.0 200
3 100 Bank 1.0 400
希望它有用。