Python / Pandas - 当我有两个索引时合并索引

时间:2017-08-23 15:32:08

标签: python pandas

我有一个带双索引的数据框,它看起来像这样:

bal:

                    ano             unit period
business_id id                                 
9564        302    2012            reais  anual
            303    2011            reais  anual
2361        304    2013            reais  anual
            305    2012            reais  anual
2369        306    2013            reais  anual
            307    2012            reais  anual

我有另一个看起来像这样的数据框:

accounts:

                           A                     B
id                                                                      

302               5964168.52          1.097601e+07
303               5774707.15          1.086787e+07
304               3652575.31          6.608469e+06 
305                321076.15          6.027066e+06
306               3858137.49          9.733126e+06

我想合并它们,看起来像这样:

                    ano             unit period              A                     B
business_id id                                 
9564        302    2012            reais  anual     5964168.52          1.097601e+07
            303    2011            reais  anual     5774707.15          1.086787e+07
2361        304    2013            reais  anual     3652575.31          6.608469e+06
            305    2012            reais  anual      321076.15          6.027066e+06
2369        306    2013            reais  anual     3858137.49          9.733126e+06 

我尝试做的事情是这样的:

bal=bal.merge(accounts,left_on='id', right_index=True)

但是我认为synthax不正确,因为我得到了一个ValueError:

ValueError: len(right_on) must equal the number of levels in the index of "left"

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:1)

目前,无法加入MultiIndex的特定级别。 您只能加入整个索引或列。

因此,在加入之前,您必须从MultiIndex中取出business_id

result = (bal.reset_index('business_id').join(accounts, how='inner')
          .set_index(['business_id'], append=True))
import pandas as pd

bal = pd.DataFrame({'ano': [2012, 2011, 2013, 2012, 2013, 2012], 'business_id': [9564, 9564, 2361, 2361, 2369, 2369], 'id': [302, 303, 304, 305, 306, 307], 'period': ['anual', 'anual', 'anual', 'anual', 'anual', 'anual'], 'unit': ['reais', 'reais', 'reais', 'reais', 'reais', 'reais']}) 
bal = bal.set_index(['business_id', 'id'])

accounts = pd.DataFrame({'A': [5964168.52, 5774707.15, 3652575.31, 321076.15, 3858137.49], 'B': [10976010.0, 10867870.0, 6608469.0, 6027066.0, 9733126.0], 'id': [302, 303, 304, 305, 306]}) 
accounts = accounts.set_index('id')

result = (bal.reset_index('business_id').join(accounts, how='inner')
          .set_index(['business_id'], append=True))

print(result)

产量

                  ano period   unit           A           B
id  business_id                                            
302 9564         2012  anual  reais  5964168.52  10976010.0
303 9564         2011  anual  reais  5774707.15  10867870.0
304 2361         2013  anual  reais  3652575.31   6608469.0
305 2361         2012  anual  reais   321076.15   6027066.0
306 2369         2013  anual  reais  3858137.49   9733126.0

答案 1 :(得分:0)

受到ununtbu的启发。添加merge

bal.reset_index(['business_id','id']).merge(accounts, left_on = 'id', right_index= True).set_index(['id','business_id'])