我想使用python的pandas加入2个如下所示的数据帧:
customer_orders = pd.DataFrame({'customerID': [1, 2, 2, 1],
'customerName': ['John', 'Anna', 'Anna', 'John'],
'customerAge': [21, 45, 45, 21],
'orderID': [255, 256, 257, 258],
'paymentType': ['visa', 'bank', 'master', 'paypal']})
创建:
customerAge customerID customerName orderID paymentType
0 21 1 John 255 visa
1 45 2 Anna 256 bank
2 45 2 Anna 257 master
3 21 1 John 258 paypal
和
order_products = pd.DataFrame({'orderID': [255, 255, 257, 258, 255, 257],
'price': [9.99, 23.40, 15.89, 3.99, 89.50, 23.40],
'productName': ['filter', 'cosmetic', 'shampoo', 'tissues', 'elecBrush', 'cosmetic']})
创建:
orderID price productName
0 255 9.99 filter
1 255 23.40 cosmetic
2 257 15.89 shampoo
3 258 3.99 tissues
4 255 89.50 elecBrush
5 257 23.40 cosmetic
如下所示 预期输出
customerAge customerID customerName orderID paymentType
21 1 John 255 visa 255 9.99 filter
21 1 John 255 visa 255 23.40 cosmetic
21 1 John 255 visa 255 89.50 elecBrush
45 2 Anna 256 bank null null null
45 2 Anna 257 master 257 15.89 shampoo
45 2 Anna 257 master 257 23.40 cosmetic
21 1 John 258 paypal 258 3.99 tissues
据我所知,这是一个SQL左连接。但是使用
all = customer_orders.join(order_products, on="orderID", how='left', lsuffix='_left', rsuffix='_right')
没有给我我想要的东西(太少的行和NaN而不是第二个表的值)。
我错过了什么?
答案 0 :(得分:4)
左?不,这是一个外部联接。
customer_orders.merge(order_products, on="orderID", how='outer')
customerAge customerID customerName orderID paymentType price \
0 21 1 John 255 visa 9.99
1 21 1 John 255 visa 23.40
2 21 1 John 255 visa 89.50
3 45 2 Anna 256 bank NaN
4 45 2 Anna 257 master 15.89
5 45 2 Anna 257 master 23.40
6 21 1 John 258 paypal 3.99
productName
0 filter
1 cosmetic
2 elecBrush
3 NaN
4 shampoo
5 cosmetic
6 tissues
答案 1 :(得分:0)
尝试使用merge
all = customer_orders.merge(order_products, on="orderID", how='left')