我有2个数据框,
第一个是
+------------------------------------------+
| ID CustomerType Choice |
+------------------------------------------+
| 0 1.0 Durability OEM |
| 1 2.0 Price OEM |
| 2 3.0 Durability OEM |
| 3 4.0 Durability OEM |
| 4 5.0 Durability OEM |
| 5 6.0 ManufacturerCredibility OEM |
| 6 7.0 Warranty OEM |
| 7 8.0 ManufacturerCredibility OEM |
| 8 9.0 Durability OEM |
| 9 10.0 Price OEM |
| 10 11.0 Durability TPN |
| 11 12.0 Warranty OEM |
| 12 13.0 Durability TPN |
+------------------------------------------+
,第二个是;
--------------------------------------------------------+
| Price Durability Warranty Manufacture Type |
+--------------------------------------------------------+
| OEM 1.00 4.0 4.0 4.0 OEM |
| TPN 0.80 4.0 1.0 1.0 TPN |
| Reman 0.55 4.0 0.5 1.0 Reman |
| Reuse 0.45 2.5 0.0 1.0 Reuse |
+--------------------------------------------------------+
我需要使用第一个数据帧中的“Choice”和第二个数据帧中的“Type”来连接这两个数据帧。
目前我正在使用
data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='left')
这样的结果很尴尬。
+------------------------------------------------------------------------------+
| Price Durability Warranty Manufacture CustomerType Choice |
+------------------------------------------------------------------------------+
| 0 1.0 4.0 4.0 4.0 Durability OEM |
| 1 1.0 4.0 4.0 4.0 Price OEM |
| 2 1.0 4.0 4.0 4.0 Durability OEM |
| 3 1.0 4.0 4.0 4.0 Durability OEM |
| 4 1.0 4.0 4.0 4.0 Durability OEM |
| 5 1.0 4.0 4.0 4.0 ManufacturerCredibility OEM |
| 6 1.0 4.0 4.0 4.0 Warranty OEM |
| 7 1.0 4.0 4.0 4.0 ManufacturerCredibility OEM |
| 8 1.0 4.0 4.0 4.0 Durability OEM |
| 9 1.0 4.0 4.0 4.0 Price OEM |
| 10 1.0 4.0 4.0 4.0 Warranty OEM |
| 11 1.0 4.0 4.0 4.0 Price OEM |
| 12 1.0 4.0 4.0 4.0 Warranty OEM |
| 13 1.0 4.0 4.0 4.0 ManufacturerCredibility OEM |
+------------------------------------------------------------------------------+
从结果表中我们可以看到所有行都包含来自第二个数据帧的OEM数据。我在这做错了什么?
答案 0 :(得分:0)
我从示例数据中得到了一些不同的输出,最后一行和3.rd
行正确合并,left join
也没有通过合并列进行排序:
data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='left')
print (data)
ID CustomerType Choice ... Warranty Manufacture Type
0 1.0 Durability OEM ... 4.0 4.0 OEM
1 2.0 Price OEM ... 4.0 4.0 OEM
2 3.0 Durability OEM ... 4.0 4.0 OEM
3 4.0 Durability OEM ... 4.0 4.0 OEM
4 5.0 Durability OEM ... 4.0 4.0 OEM
5 6.0 ManufacturerCredibility OEM ... 4.0 4.0 OEM
6 7.0 Warranty OEM ... 4.0 4.0 OEM
7 8.0 ManufacturerCredibility OEM ... 4.0 4.0 OEM
8 9.0 Durability OEM ... 4.0 4.0 OEM
9 10.0 Price OEM ... 4.0 4.0 OEM
10 11.0 Durability TPN ... 1.0 1.0 TPN
11 12.0 Warranty OEM ... 4.0 4.0 OEM
12 13.0 Durability TPN ... 1.0 1.0 TPN
[13 rows x 8 columns]
如果排序似乎使用inner join
(默认值):
data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type')
#same as
#data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='inner')
print (data)
ID CustomerType Choice ... Warranty Manufacture Type
0 1.0 Durability OEM ... 4.0 4.0 OEM
1 2.0 Price OEM ... 4.0 4.0 OEM
2 3.0 Durability OEM ... 4.0 4.0 OEM
3 4.0 Durability OEM ... 4.0 4.0 OEM
4 5.0 Durability OEM ... 4.0 4.0 OEM
5 6.0 ManufacturerCredibility OEM ... 4.0 4.0 OEM
6 7.0 Warranty OEM ... 4.0 4.0 OEM
7 8.0 ManufacturerCredibility OEM ... 4.0 4.0 OEM
8 9.0 Durability OEM ... 4.0 4.0 OEM
9 10.0 Price OEM ... 4.0 4.0 OEM
10 12.0 Warranty OEM ... 4.0 4.0 OEM
11 11.0 Durability TPN ... 1.0 1.0 TPN
12 13.0 Durability TPN ... 1.0 1.0 TPN
[13 rows x 8 columns]