我有以下数据框df1
:
Date Invoice Name Price Coupon Location
0 2017-12-24 700349 John Doe 59.95 NONE VAGG1
1 2017-12-24 700347 Joe Smith 59.95 GBMR GG
2 2017-12-24 700345 Dave Johnson 35.00 CHANGE VAGG1
3 2017-12-24 700342 Sue Davis 35.00 GADSLR VAGG1
4 2017-12-23 700329 Betty Clark 84.95 GADSLR GG2
和第二个数据框df2
:
Date Invoice Name Price Coupon Location
0 2017-12-24 800349 John Doe 59.95 NONE VAGG1
1 2017-12-24 800347 Joe Smith 59.95 GBMR GG
2 2017-12-24 800345 John Doe 17.95 CHANGE VAGG1
3 2017-12-24 800342 John Doe 9.95 GADSLR VAGG1
4 2017-12-23 800329 Sue Simpson 34.95 GADSLR GG2
我想使用以下逻辑创建第三个Dataframe df3
。
df1
中的每个名称,请检查是否匹配。df2
添加到df3
该行的价格与相关价格不符
该名称如果df1
。因此输出数据帧df3
应如下所示:
+------------+---------+----------+-------+--------+----------+
| Date | Invoice | Name | Price | Coupon | Location |
+------------+---------+----------+-------+--------+----------+
| 2017-12-24 | 800345 | John Doe | 17.95 | CHANGE | VAGG1 |
| 2017-12-24 | 800342 | John Doe | 9.95 | GADSLR | VAGG1 |
+------------+---------+----------+-------+--------+----------+
答案 0 :(得分:1)
使用merge
+ query
-
df1.merge(df2[['Name', 'Price']], on='Name')\
.query('Price_x != Price_y')\
.drop('Price_x', 1)\
.rename(columns={'Price_y' : 'Price'})
Date Invoice Name Coupon Location Price
1 2017-12-24 700349 John Doe NONE VAGG1 17.95
2 2017-12-24 700349 John Doe NONE VAGG1 9.95
df1
和df2
是您各自的数据框。
答案 1 :(得分:1)
以下代码块:
df3 = pd.merge(df1, df2, on='Name', how='right')\
.query('Price_x != Price_y')\
.drop('Price_x', 1)\
.rename(columns={'Price_y' : 'Price'})
df3 =
Date_x Invoice_x Name Coupon_x Location_x Date_y \
1 2017-12-24 700349.0 John Doe NONE VAGG1 2017-12-24
2 2017-12-24 700349.0 John Doe NONE VAGG1 2017-12-24
4 NaN NaN Sue Simpson NaN NaN 2017-12-23
Invoice_y Price Coupon_y Location_y
1 800345 17.95 CHANGE VAGG1
2 800342 9.95 GADSLR VAGG1
4 800329 34.95 GADSLR GG2
扩展代码块:
df3 = pd.merge(df1, df2, on='Name', how='right')\
.query('Price_x != Price_y')\
.drop('Price_x', 1)\
.rename(columns={'Price_y' : 'Price'})\
.drop('Location_x',1)\
.drop('Coupon_x',1)\
.drop('Date_x',1)\
.drop('Invoice_x',1)\
.rename(columns={'Date_y' : 'Date'})\
.rename(columns={'Invoice_y' : 'Invoice'})\
.rename(columns={'Coupon_y' : 'Coupon'})\
.rename(columns={'Location_y' : 'Location'})
df3 =
Name Date Invoice Price Coupon Location
1 John Doe 2017-12-24 800345 17.95 CHANGE VAGG1
2 John Doe 2017-12-24 800342 9.95 GADSLR VAGG1
4 Sue Simpson 2017-12-23 800329 34.95 GADSLR GG2
这是有问题的,因为它导致列行无序。添加:
df3=df3[['Date', 'Invoice', 'Name', 'Price', 'Coupon', 'Location']]
我们得到df3 =
Date Invoice Name Price Coupon Location
1 2017-12-24 800345 John Doe 17.95 CHANGE VAGG1
2 2017-12-24 800342 John Doe 9.95 GADSLR VAGG1
4 2017-12-23 800329 Sue Simpson 34.95 GADSLR GG2
除了" Sue Simpson"进入,应该缺席。