我有两个熊猫数据框,一个是查询表,另一个是“主”表。
查找表是这样。
import pandas as pd
lu_dict = {'state': ['OH', 'TX', 'IA', 'WY', 'KS'], 'fire_pct':[0.542630,.174425,0.206752,0.004621,0.441946]
, 'hail_pct':[0.008787,0.440272,0.422005,0.434709,0.312338]
,'tw_pct':[0.101449,0.179536,0.159886,0.028349,0.151416]
,'other_pct':[0.224980,0.160096,0.149560,0.393357,0.036523]
,'wp_pct':[0.122154,0.045671,0.061796,0.138963,0.057777]}
lu = pd.DataFrame(lu_dict)
主表如下:
preds_dict = {'state':['OH', 'TX', 'IA', 'WY', 'KS'],
'fire_preds':[.01,.02,.03,.015,.66]
, 'hail_preds':[.03,.005,.12,.23,.006]
,'tw_preds':[.001,.02,.0035,.04,.02]
,'other_preds':[.003,.05,.001,.01,.06]
,'wp_preds':[.002,.03,.005,.01,.04]}
preds = pd.DataFrame(preds_dict)
我需要在“主”表中的观察值与查找表中的state
列匹配,然后将查找表中的fire_pct
与“主”中的“ fire_preds”相乘表,“ other_pct”是“ other_preds”,“ wp_pct”是“ wp_preds”,等等。
如果字典对于查找表更好地工作,那很好。我只需要将主表保留为当前数据帧形式以进行进一步处理。
最后,我要查找的输出是一列中那些乘法输出的总和。
答案 0 :(得分:1)
IIUC,您需要重新命名以使熊猫正确对齐数据。
mults = (lu.rename(columns=dict(zip(lu.columns, preds.columns))).set_index('state') *
preds.set_index('state'))
print(mults)
输出:
fire_preds hail_preds tw_preds other_preds wp_preds
state
OH 0.005426 0.000264 0.000101 0.000675 0.000244
TX 0.003488 0.002201 0.003591 0.008005 0.001370
IA 0.006203 0.050641 0.000560 0.000150 0.000309
WY 0.000069 0.099983 0.001134 0.003934 0.001390
KS 0.291684 0.001874 0.003028 0.002191 0.002311
总和:
mults.sum()
fire_preds 0.306871
hail_preds 0.154963
tw_preds 0.008414
other_preds 0.014954
wp_preds 0.005624
dtype: float64
按州求和:
mults.sum(axis=1)
state
OH 0.006711
TX 0.018656
IA 0.057861
WY 0.106510
KS 0.301089
dtype: float64