合并MultIndex DataFrames

时间:2016-04-21 13:55:12

标签: python pandas

考虑以下两个DataFrame:

arrays1 = [['foo', 'bar', 'bar', 'bar'],
          ['A', 'D', 'E', 'F']]
tuples1 = list(zip(*arrays1))          
columnValues1 = pd.MultiIndex.from_tuples(tuples1)
df1 = pd.DataFrame(np.random.rand(4,4), columns = columnValues1)
print(df1)
        foo       bar                    
          A         D         E         F
0  0.833444  0.354676  0.468294  0.173005
1  0.409730  0.275342  0.595433  0.322785
2  0.515161  0.340063  0.117509  0.491957
3  0.285594  0.970524  0.322902  0.628351

arrays2 = [['foo', 'foo', 'bar', 'bar'],
          ['B', 'C', 'G', 'H']]
tuples2 = list(zip(*arrays2))          
columnValues2 = pd.MultiIndex.from_tuples(tuples2)
df2 = pd.DataFrame(np.random.rand(4,4), columns = columnValues2)
print(df2)
        foo                 bar          
          B         C         G         H
0  0.208822  0.762884  0.424412  0.583324
1  0.767560  0.884583  0.716843  0.329719
2  0.147991  0.424748  0.560599  0.828155
3  0.376050  0.436354  0.704379  0.406324

说我想合并这些来获得这个:

          foo                                bar                
            A           B          C           D           E           F           G           H
0    0.833444    0.208822   0.762884    0.354676    0.468294    0.173005    0.424412    0.583324
1    0.409730    0.767560   0.884583    0.275342    0.595433    0.322785    0.716843    0.329719
2    0.515161    0.147991   0.424748    0.340063    0.117509    0.491957    0.560599    0.828155
3    0.285594    0.376050   0.436354    0.970524    0.322902    0.628351    0.704379    0.406324

我尝试过合并:

pd.merge(df1.reset_index(), df2.reset_index(), on=df1.columns.levels[0], 
how='inner').set_index(df1.columns.levels[0])

不幸的是我收到以下错误消息:

ValueError: The truth value of an array with more than one element is ambiguous. 
Use a.any() or a.all()

如何合并2个MultiIndex DataFrame? `

2 个答案:

答案 0 :(得分:1)

这实际上不是“合并”,因为您并不真正匹配数据框之间的值,而只是并排添加一些列。所以boolean plus, minus; 可以满足您的需求:

pd.concat

答案 1 :(得分:1)

更新:动态选择列:

In [57]: join = df1.join(df2)

In [58]: cols = join.columns.get_level_values(0).unique()

In [59]: cols
Out[59]: array(['foo', 'bar'], dtype=object)

In [60]: join = join[cols]

In [61]: join
Out[61]:
        foo                           bar                                \
          A         B         C         D         E         F         G
0  0.176934  0.694937  0.947164  0.510407  0.085626  0.162183  0.382840
1  0.973283  0.743907  0.886495  0.028961  0.740759  0.330742  0.961932
2  0.898224  0.966278  0.131551  0.517563  0.026104  0.624047  0.848640
3  0.713660  0.704461  0.419997  0.718130  0.252294  0.336838  0.016916


          H
0  0.929695
1  0.444762
2  0.338168
3  0.635817

joined = df1.join(df2)[['foo','bar']]

说明:

您可以先加入您的DF:

In [47]: join = df1.join(df2)

In [48]: join
Out[48]:
        foo       bar                           foo                 bar  \
          A         D         E         F         B         C         G
0  0.176934  0.510407  0.085626  0.162183  0.694937  0.947164  0.382840
1  0.973283  0.028961  0.740759  0.330742  0.743907  0.886495  0.961932
2  0.898224  0.517563  0.026104  0.624047  0.966278  0.131551  0.848640
3  0.713660  0.718130  0.252294  0.336838  0.704461  0.419997  0.016916


          H
0  0.929695
1  0.444762
2  0.338168
3  0.635817

然后按所需顺序选择列(级别:0):

In [49]: join = join[['foo','bar']]

In [50]: join
Out[50]:
        foo                           bar                                \
          A         B         C         D         E         F         G
0  0.176934  0.694937  0.947164  0.510407  0.085626  0.162183  0.382840
1  0.973283  0.743907  0.886495  0.028961  0.740759  0.330742  0.961932
2  0.898224  0.966278  0.131551  0.517563  0.026104  0.624047  0.848640
3  0.713660  0.704461  0.419997  0.718130  0.252294  0.336838  0.016916


          H
0  0.929695
1  0.444762
2  0.338168
3  0.635817