使熊猫具有多个索引列的多个数据框并完全连接

时间:2020-08-13 17:06:27

标签: pandas join reduce multi-index

有人会说这需要两个单独的问题,但是它们是相互关联的,所以我只在这里写下它们。

1。制作多索引列

我有三个数据框:

data_large = pd.DataFrame({"name":["a", "b", "c"], "sell":[10, 60, 50], "buy":[20, 30, 40]})
data_mini = pd.DataFrame({"name":["b", "c", "d"], "sell":[60, 20, 10], "buy":[30, 50, 40]})
data_topix = pd.DataFrame({"name":["a", "b", "c"], "sell":[10, 80, 0], "buy":[70, 30, 40]})

但是首先,我想使它们的列像下面这样多索引。

enter image description here

这是我尝试过的方法,但未按预期工作。 name处于索引级别Nikkei225Large

iterables = [['Nikkei225Large'], ['name', 'buy', 'sell']]
index_large = pd.MultiIndex.from_product(iterables, names=['product', 'sell_buy'])
data_large.columns = index_large

enter image description here

2。例如,将具有多个索引列的多个熊猫连接起来。使用reduce

接下来,在列name上将三个数据帧完全外部联接。预期输出为: enter image description here

就目前而言,我只是使用reduce来加入他们,如下所示,但我想使用多索引列。

from functools import reduce
dfs = {0: data_large, 1: data_mini, 2: data_topix}

def agg_df(dfList):
    df_agged = reduce(lambda left, right: pd.merge(left, right, 
                                                   left_index=True, right_index=True, 
                                                   on='name',
                                                   how='outer'), dfList)
    return df_agged

df_final = agg_df(dfs.values())

任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:1)

IIUC,您可以使用带有pd.concat参数的keys

df_out = pd.concat([dfi.set_index('name') for dfi in [data_large, data_mini, data_topix]], 
                   keys=['Nikkei225Large', 'Nikkei225Mini', 'Topix'], axis=1)\
           .rename_axis(index=['Name'], columns=['product','buy_sell'])

输出:

product  Nikkei225Large       Nikkei225Mini       Topix      
buy_sell           sell   buy          sell   buy  sell   buy
Name                                                         
a                  10.0  20.0           NaN   NaN  10.0  70.0
b                  60.0  30.0          60.0  30.0  80.0  30.0
c                  50.0  40.0          20.0  50.0   0.0  40.0
d                   NaN   NaN          10.0  40.0   NaN   NaN