stores = [[232, '2016-02-05 04:30:00', 'Test User', 1],
[332, '2016-02-06 04:30:00', 'Test User', 2],
[432, '2016-02-07 04:30:00', 'Test User', 3],
[532, '2016-02-08 04:30:00', 'Test User', 4],
[632, '2016-02-09 04:30:00', 'Test User', 5]]
visits = pd.DataFrame(data=stores, columns=['store', 'visit', 'auditor', 'scene'])
visits.set_index(['store', 'visit'], inplace=True)
scenes = [[1, 1551, 2],
[5, 1661, 4]]
scenes = pd.DataFrame(data=scenes, columns=['scene', 'product', 'amount'])
scenes.set_index('scene', inplace=True)
store_with_products = pd.merge(visits, scenes, left_on='scene', right_index=True, how='right')
我得到的结果如下:
auditor scene product amount
store visit
232 2016-02-05 04:30:00 Test User 1 1551 2
632 2016-02-09 04:30:00 Test User 5 1661 4
但我正在做right join
为什么我没有得到用NAN
填充的完整存储矩阵,其中相关数据缺少场景矩阵?
我如何解决上述问题?
答案 0 :(得分:0)
您需要左连接,而不是右连接。然后它工作:
auditor scene product amount
store visit
232 2016-02-05 04:30:00 Test User 1 1551.0 2.0
332 2016-02-06 04:30:00 Test User 2 NaN NaN
432 2016-02-07 04:30:00 Test User 3 NaN NaN
532 2016-02-08 04:30:00 Test User 4 NaN NaN
632 2016-02-09 04:30:00 Test User 5 1661.0 4.0