用多索引列连接两个熊猫数据框

时间:2020-08-14 11:41:58

标签: pandas join multi-index

我想加入两个熊猫数据框,其中一个具有多索引列。

这就是我制作第一个数据框的方式。

data_large = pd.DataFrame({"name":["a", "b", "c"], "sell":[10, 60, 50], "buy":[20, 30, 40]})
data_mini = pd.DataFrame({"name":["b", "c", "d"], "sell":[60, 20, 10], "buy":[30, 50, 40]})
data_topix = pd.DataFrame({"name":["a", "b", "c"], "sell":[10, 80, 0], "buy":[70, 30, 40]})

df_out = pd.concat([dfi.set_index('name') for dfi in [data_large, data_mini, data_topix]], 
                   keys=['Large', 'Mini', 'Topix'], axis=1)\
           .rename_axis(mapper=['name'], axis=0).rename_axis(mapper=['product','buy_sell'], axis=1)
df_out

enter image description here

这是第二个数据帧。

group = pd.DataFrame({"name":["a", "b", "c", "d"], "group":[1, 1, 2, 2]})
group

enter image description here

如何在列name上将第二个与第一个连接起来,并保持多索引列?

这不起作用,它使多索引变平了。

df_final = df_out.merge(group, on=['name'], how='left')

任何帮助将不胜感激!

enter image description here

1 个答案:

答案 0 :(得分:0)

如果需要MultiIndex之后需要merge,则将列group转换为MultiIndex DataFrame,这里将列name转换为索引以按索引合并,否则两者都合并列必须转换为MultiIndex

group = group.set_index('name')
group.columns = pd.MultiIndex.from_product([group.columns, ['new']])

df_final = df_out.merge(group, on=['name'], how='left')

或者:

df_final = df_out.merge(group, left_index=True, right_index=True, how='left')

print (df_final)
product  Large        Mini       Topix       group
buy_sell  sell   buy  sell   buy  sell   buy   new
name                                              
a         10.0  20.0   NaN   NaN  10.0  70.0     1
b         60.0  30.0  60.0  30.0  80.0  30.0     1
c         50.0  40.0  20.0  50.0   0.0  40.0     2
d          NaN   NaN  10.0  40.0   NaN   NaN     2

另一种可能的方式,但带有警告的是在MultiIndex之后将值转换为merge

df_final = df_out.merge(group, on=['name'], how='left')

UserWarning:在不同级别之间合并会产生意想不到的结果(左侧2个级别,右侧1个级别) warnings.warn(msg,UserWarning)


L = [x if isinstance(x, tuple) else (x, 'new') for x in df_final.columns.tolist()]
df_final.columns = pd.MultiIndex.from_tuples(L)   
print (df_final)
  name Large        Mini       Topix       group
   new  sell   buy  sell   buy  sell   buy   new
0    a  10.0  20.0   NaN   NaN  10.0  70.0     1
1    b  60.0  30.0  60.0  30.0  80.0  30.0     1
2    c  50.0  40.0  20.0  50.0   0.0  40.0     2
3    d   NaN   NaN  10.0  40.0   NaN   NaN     2

编辑:如果需要group中的MultiIndex

group = group.set_index(['name'])
group.columns = pd.MultiIndex.from_product([group.columns, ['new']])

df_final = (df_out.merge(group, on=['name'], how='left')
                  .set_index([('group','new')], append=True)
                  .rename_axis(['name','group']))
print (df_final)
product    Large        Mini       Topix      
buy_sell    sell   buy  sell   buy  sell   buy
name group                                    
a    1      10.0  20.0   NaN   NaN  10.0  70.0
b    1      60.0  30.0  60.0  30.0  80.0  30.0
c    2      50.0  40.0  20.0  50.0   0.0  40.0
d    2       NaN   NaN  10.0  40.0   NaN   NaN

或者:

df_final = df_out.merge(group, on=['name'], how='left').set_index(['name','group'])
df_final.columns = pd.MultiIndex.from_tuples(df_final.columns)
print (df_final)
           Large        Mini       Topix      
            sell   buy  sell   buy  sell   buy
name group                                    
a    1      10.0  20.0   NaN   NaN  10.0  70.0
b    1      60.0  30.0  60.0  30.0  80.0  30.0
c    2      50.0  40.0  20.0  50.0   0.0  40.0
d    2       NaN   NaN  10.0  40.0   NaN   NaN