有人会说这需要两个单独的问题,但是它们是相互关联的,所以我只在这里写下它们。
1。制作多索引列
我有三个数据框:
data_large = pd.DataFrame({"name":["a", "b", "c"], "sell":[10, 60, 50], "buy":[20, 30, 40]})
data_mini = pd.DataFrame({"name":["b", "c", "d"], "sell":[60, 20, 10], "buy":[30, 50, 40]})
data_topix = pd.DataFrame({"name":["a", "b", "c"], "sell":[10, 80, 0], "buy":[70, 30, 40]})
但是首先,我想使它们的列像下面这样多索引。
这是我尝试过的方法,但未按预期工作。 name
处于索引级别Nikkei225Large
iterables = [['Nikkei225Large'], ['name', 'buy', 'sell']]
index_large = pd.MultiIndex.from_product(iterables, names=['product', 'sell_buy'])
data_large.columns = index_large
2。例如,将具有多个索引列的多个熊猫连接起来。使用reduce
接下来,在列name
上将三个数据帧完全外部联接。预期输出为:
就目前而言,我只是使用reduce
来加入他们,如下所示,但我想使用多索引列。
from functools import reduce
dfs = {0: data_large, 1: data_mini, 2: data_topix}
def agg_df(dfList):
df_agged = reduce(lambda left, right: pd.merge(left, right,
left_index=True, right_index=True,
on='name',
how='outer'), dfList)
return df_agged
df_final = agg_df(dfs.values())
任何帮助将不胜感激!
答案 0 :(得分:1)
IIUC,您可以使用带有pd.concat
参数的keys
:
df_out = pd.concat([dfi.set_index('name') for dfi in [data_large, data_mini, data_topix]],
keys=['Nikkei225Large', 'Nikkei225Mini', 'Topix'], axis=1)\
.rename_axis(index=['Name'], columns=['product','buy_sell'])
输出:
product Nikkei225Large Nikkei225Mini Topix
buy_sell sell buy sell buy sell buy
Name
a 10.0 20.0 NaN NaN 10.0 70.0
b 60.0 30.0 60.0 30.0 80.0 30.0
c 50.0 40.0 20.0 50.0 0.0 40.0
d NaN NaN 10.0 40.0 NaN NaN