说我有几个2x2“块”,我想聚合成一个数据帧:
import pandas as pd
import numpy as np
np.random.seed(0)
col_l1 = ['cl1_1', 'cl1_2']
col_l2 = ['cl2_1', 'cl2_2']
row_l1 = ['rl1_1', 'rl1_2']
row_l2 = ['rl2_1', 'rl2_2']
subframes = []
for cl1 in col_l1:
for rl1 in row_l1:
idx = pd.MultiIndex.from_tuples([(rl1, r) for r in row_l2])
cols = pd.MultiIndex.from_tuples([(cl1, c) for c in col_l2])
data = np.random.randn(2, 2)
df = pd.DataFrame(data=data, index=idx, columns=cols)
subframes.append(df)
我当然可以这样做:
combined = pd.concat(subframes, axis=0)
但结果会留下“漏洞”:
cl1_1 cl1_2
cl2_1 cl2_2 cl2_1 cl2_2
rl1_1 rl2_1 1.764052 0.400157 NaN NaN
rl2_2 0.978738 2.240893 NaN NaN
rl1_2 rl2_1 1.867558 -0.977278 NaN NaN
rl2_2 0.950088 -0.151357 NaN NaN
rl1_1 rl2_1 NaN NaN -0.103219 0.410599
rl2_2 NaN NaN 0.144044 1.454274
rl1_2 rl2_1 NaN NaN 0.761038 0.121675
rl2_2 NaN NaN 0.443863 0.333674
我也可以跑:
combined = reduce(lambda x, y: x.combine_first(y), subframes)
非常适合给出:
cl1_1 cl1_2
cl2_1 cl2_2 cl2_1 cl2_2
rl1_1 rl2_1 1.764052 0.400157 -0.103219 0.410599
rl2_2 0.978738 2.240893 0.144044 1.454274
rl1_2 rl2_1 1.867558 -0.977278 0.761038 0.121675
rl2_2 0.950088 -0.151357 0.443863 0.333674
这是最适合做这种事情的方法吗?是否有一个更普遍接受的习语或是否有内置的这个?