我想在两个 Pandas 数据框中合并相同的 Result
列。 “结果”列不应填充有矛盾的值。两个数据框都有两列 id
和 sub_id
作为唯一标识符。
第一个数据框看起来像这样:
id sub_id Result
0 G1 00
1 G1 F1 under-reporting
2 G2 N1 under-reporting
第二个数据框看起来像这样:
id sub_id Result
0 G3 W1 over-reporting
1 G3 00 over-reporting
2 G4 K5
如果记录未填充 under-reporting
或 over-reporting
,我想用字符串 pass
填充该记录。
结果,我希望输出看起来像这样:
id sub_id Result
0 G1 00 Pass
1 G1 F1 under-reporting
2 G2 N1 under-reporting
3 G3 W1 over-reporting
4 G3 00 over-reporting
5 G4 K5 Pass
下面是我现在申请的代码:
#User a jointed mask to filter reportable deals
reportable_deals = df[joint_logic_of_reportable_deals]
under_reporting_df = reportable_deals[['id', 'sub_id']].copy()
#User left merge to identify under-reporting deals (i.e., reportable deals not in the trade_state_df)
under_reporting_df = under_reporting_df.merge(trade_state_df, how='left', on=['id', 'sub_id'], indicator='Result')
under_reporting_df['Result'] = under_reporting_df['Result'].map({
'both': np.nan,
'left_only': 'under-reporting',
'right_only': np.nan
})
#Obtain not-reportable deals using the inverse of the jointed mask
not_reportable_deals = df_data_store[~joint_logic_of_reportable_deals]
over_reporting_df = not_reportable_deals[['id', 'sub_id']].copy()
over_reporting_df['sub_id'] = over_reporting_df['sub_id'].astype(str).str.zfill(2)
#User the left merge to identify over-reporting deals (i.e., not-reportable but exists in the trade_state_df)
over_reporting_df = over_reporting_df.merge(trade_state_df, how='left', on=['id', 'sub_id'], indicator='Result')
over_reporting_df['Result'] = (over_reporting_df['Result'] == 'both')
over_reporting_df['Result'] = np.where(over_reporting_df['Result'], 'over-reporting', np.nan)
output_df = pd.concat([under_reporting_df, over_reporting_df])
output_df = output_df.reset_index(drop=True)
header = ['id', 'sub_id', 'Result']
output_df.to_csv("Eligibility Result.csv", columns = header)
然而,问题是在 concat
方法之后,output_df
现在比原来的 df
多了 7 个交易
非常感谢您的帮助。
答案 0 :(得分:0)
假设缺失值是'NaN',你可以试试fillna:
(df1.set_index(["id", 'sub_id'])
.fillna(df2.set_index(["id", 'sub_id']))
.fillna("pass")
.reset_index())
结果
id sub_id Result
0 G1 00 over-reporting
1 G1 F1 under-reporting
2 G2 N1 under-reporting
3 G3 W1 pass
4 G3 00 pass
5 G4 K5 over-reporting