我有两个数据框,它们具有相同的日期和客户ID,但数量不同。
我尝试获取另一个具有dfA金额值的数据帧,并在dfA不存在时在dfB上保留另一个0
dfA:
client_id date amount
0 1 2020-07-11 100
1 1 2020-07-10 90
2 1 2020-07-09 80
3 1 2020-07-12 70
3 1 2020-07-01 86
dfB:
client_id date amount
0 1 2020-07-11 0
1 1 2020-07-10 0
2 1 2020-07-09 0
3 1 2020-07-07 0
4 1 2020-07-06 0
5 1 2020-07-05 0
5 1 2020-07-04 0
3 1 2020-07-03 0
4 1 2020-07-02 0
5 1 2020-07-01 0
我想得到:
dfResult:
client_id date amount
0 1 2020-07-11 100
1 1 2020-07-10 90
2 1 2020-07-09 80
3 1 2020-07-07 70
4 1 2020-07-06 0
5 1 2020-07-05 0
5 1 2020-07-04 0
3 1 2020-07-03 0
4 1 2020-07-02 0
5 1 2020-07-01 86
答案 0 :(得分:1)
您可以concat
将df放在一起,按数量排序,然后删除重复项。
dfResult = pd.concat([dfA,dfB]).sort_values(by='amout',ascending = False).drop_duplicates(subset=['client_id','date'],keep='first').reset_index().sort_values(by=['client id','date'],ascending = (True,False))
答案 1 :(得分:0)
尝试一下
(
dfB.date.map(
dfA.set_index('date')['amount'].to_dict()
).fillna(0.0)
)
或
(
dfB.merge(
dfA, on=['client_id', 'date'], suffixes=("_x", ""), how='left'
).fillna(0.0).drop(columns=["amount_x"])
)
client_id date amount
0 1 2020-07-11 100.0
1 1 2020-07-10 90.0
2 1 2020-07-09 80.0
3 1 2020-07-07 0.0
4 1 2020-07-06 0.0
5 1 2020-07-05 0.0
5 1 2020-07-04 0.0
3 1 2020-07-03 0.0
4 1 2020-07-02 0.0
5 1 2020-07-01 86.0