我想通过检查它们是否包含相同的文档编号,将一个数据框中的值附加到另一个数据框中。
ip_df:
CardName DocNum DocDate DocTotal DocNum2 PaidToDate Balance
0 CompanyA 800100001 2021-03-01 10000.000000 920000000 10000.000000 0.000000
1 CompanyA 800100002 2021-03-01 20000.000000 920000000 20000.000000 0.000000
2 CompanyA 800100003 2021-03-01 30000.000000 920000000 30000.000000 0.000000
3 CompanyA 800100004 2021-03-01 40000.000000 920000000 40000.000000 0.000000
4 CompanyA 800100005 2021-03-01 50000.000000 920000000 50000.000000 0.000000
.. ... ... ... ... ... ... ...
94 CompanyY 800100006 2021-03-01 60000.000000 920000005 60000.000000 0.000000
95 CompanyY 800100007 2021-03-01 70000.000000 920000005 70000.000000 0.000000
96 CompanyY 800100008 2021-03-01 80000.000000 920000005 80000.000000 0.000000
97 CompanyZ 800100009 2021-03-01 90000.000000 920000006 90000.000000 0.000000
98 CompanyZ 800100010 2021-03-01 11000.000000 920000006 11000.000000 0.000000
[99 rows x 7 columns]
ar_df:
BPCode Balance Currency DueDate BPName TransId Ref1 Payment Received Bank Charge
0 XXXXXXXX 10000.000000 USD 2020-09-29 CompanyA 503378 800100001 0 0
1 XXXXXXXX 20000.000000 USD 2021-03-01 CompanyA 543103 800100002 0 0
2 XXXXXXXX 30000.000000 USD 2021-03-01 CompanyA 543171 800100003 0 0
3 XXXXXXXX 40000.000000 USD 2021-03-01 CompanyA 544205 800100004 0 0
4 XXXXXXXX 50000.000000 USD 2021-03-01 CompanyA 544222 800100005 0 0
... ... ... ... ... ... ... ... ... ...
3763 XXXXXXXX 60000.000000 USD 2021-03-02 CompanyY 548612 800100006 0 0
3764 XXXXXXXX 70000.000000 USD 2021-03-02 CompanyY 547727 800100007 0 0
3765 XXXXXXXX 80000.000000 USD 2021-03-30 CompanyY 553819 800100008 0 0
3766 XXXXXXXX 90000.000000 USD 2021-04-01 CompanyZ 547707 800100009 0 0
3767 XXXXXXXX 11000.000000 USD 2021-04-29 CompanyZ 556102 800100010 0 0
[3768 rows x 9 columns]
我正在尝试这样做:
for row in ip_df:
if ip_df.row['DocNum'] == ar_df.row['Ref1']:
ap_df.row['Payment Received'] = ip_df.row['PaidToDate']
但我不太确定这样做的正确方法。
我尝试四处搜索,但大多数情况下,答案似乎总是指向仅检查标量值。
我想要实现的是检查ip_df中是否在一行中,是否可以在ap_df的'Ref1'列中找到'DocNum',如果为真,则将ap_df的'Payment Received'行设置为来自ip_df's的值PaidToDate' 行。
最终结果应如下所示:
BPCode Balance Currency DueDate BPName TransId Ref1 Payment Received Bank Charge
0 XXXXXXXX 10000.000000 USD 2020-09-29 CompanyA 503378 800100001 10000.000000 0
1 XXXXXXXX 20000.000000 USD 2021-03-01 CompanyA 543103 800100002 20000.000000 0
2 XXXXXXXX 30000.000000 USD 2021-03-01 CompanyA 543171 800100003 30000.000000 0
3 XXXXXXXX 40000.000000 USD 2021-03-01 CompanyA 544205 800100004 40000.000000 0
4 XXXXXXXX 50000.000000 USD 2021-03-01 CompanyA 544222 800100005 50000.000000 0
... ... ... ... ... ... ... ... ... ...
3763 XXXXXXXX 60000.000000 USD 2021-03-02 CompanyY 548612 800114258 60000.000000 0
3764 XXXXXXXX 70000.000000 USD 2021-03-02 CompanyY 547727 800113975 70000.000000 0
3765 XXXXXXXX 80000.000000 USD 2021-03-30 CompanyY 553819 800115292 80000.000000 0
3766 XXXXXXXX 90000.000000 USD 2021-04-01 CompanyZ 547707 800113957 90000.000000 0
3767 XXXXXXXX 11000.000000 USD 2021-04-29 CompanyZ 556102 800115741 11000.000000 0
[3768 rows x 9 columns]
提前感谢您的帮助!!
答案 0 :(得分:2)
在带有 df.merge
的列子集上使用 df.rename
:
In [220]: ar_df['Ref1'] = ar_df['Ref1'].str.replace('', np.nan).astype(int)
In [221]: ip_df['DocNum'] = ar_df['DocNum'].str.replace('', np.nan).astype(int)
In [222]: ar_df.merge(ip_df[['DocNum', 'PaidToDate']], left_on='Ref1', right_on='DocNum').drop(['Payment Received', 'DocNum'], 1).rename(columns={'PaidToDate':'Payment Received'})
Out[222]:
BPCode Balance Currency DueDate BPName TransId Ref1 Bank_Charge Payment_Received
0 XXXXXXXX 10000.0 USD 2020-09-29 CompanyA 503378 800100001 0 10000.0
1 XXXXXXXX 20000.0 USD 2021-03-01 CompanyA 543103 800100002 0 20000.0
2 XXXXXXXX 30000.0 USD 2021-03-01 CompanyA 543171 800100003 0 30000.0
3 XXXXXXXX 40000.0 USD 2021-03-01 CompanyA 544205 800100004 0 40000.0
4 XXXXXXXX 50000.0 USD 2021-03-01 CompanyA 544222 800100005 0 50000.0