我有两个数据帧 train_family_sales
family store_nbr date unit_sales
0 GROCERY I 1.0 2016-08-01 3.0
1 GROCERY I 1.0 2016-08-02 10.0
2 GROCERY I 1.0 2016-08-04 3.0
3 AUTOMOTIVE 1.0 2016-08-05 5.0
4 AUTOMOTIVE 1.0 2016-08-06 5.0
和train_sales
date store_nbr item_nbr unit_sales family
0 2016-08-01 1.0 103520 3.0 GROCERY I
1 2016-08-02 1.0 103520 1.0 GROCERY I
2 2016-08-04 1.0 103520 6.0 GROCERY I
3 2016-08-05 1.0 103520 2.0 AUTOMOTIVE
4 2016-08-06 1.0 103520 2.0 AUTOMOTIVE
我想将它们合并到我得到以下内容
date store_nbr item_nbr unit_sales family f_unit_sales
0 2016-08-01 1.0 103520 3.0 GROCERY I 3.0
1 2016-08-02 1.0 103520 1.0 GROCERY I 10.0
2 2016-08-04 1.0 103520 3.0 GROCERY I 3.0
3 2016-08-05 1.0 103520 2.0 AUTOMOTIVE 5.0
4 2016-08-06 1.0 103520 2.0 AUTOMOTIVE 6.0
我正在尝试执行以下操作:
both_sales = train_sales_with_family.join(train_family_sales,how='left', on=['store_nbr','family','date'], rsuffix='f_')
但是我收到了一个错误。 ValueError:len(left_on)必须等于"右边"
索引中的级别数有关如何合并的任何建议吗?
答案 0 :(得分:2)
我认为你需要merge
:
both_sales = train_sales.merge(train_family_sales,
how='left',
on=['store_nbr','family','date'],
suffixes=('','_'))
或为join
添加set_index
- 与MultiIndex
参数中的列需要相同级别的on
:
both_sales = train_sales.join(train_family_sales.set_index(['store_nbr','family','date']),
on=['store_nbr','family','date'],
rsuffix='_')
print (both_sales)
date store_nbr item_nbr unit_sales family unit_sales_
0 2016-08-01 1.0 103520 3.0 GROCERY I 3.0
1 2016-08-02 1.0 103520 1.0 GROCERY I 10.0
2 2016-08-04 1.0 103520 6.0 GROCERY I 3.0
3 2016-08-05 1.0 103520 2.0 AUTOMOTIVE 5.0
4 2016-08-06 1.0 103520 2.0 AUTOMOTIVE 5.0