更简洁的方法在多个pandas DataFrame上使用combine_first

时间:2014-01-24 16:15:28

标签: python pandas

说我有几个2x2“块”,我想聚合成一个数据帧:

import pandas as pd
import numpy as np

np.random.seed(0)

col_l1 = ['cl1_1', 'cl1_2']
col_l2 = ['cl2_1', 'cl2_2']
row_l1 = ['rl1_1', 'rl1_2']
row_l2 = ['rl2_1', 'rl2_2']

subframes = []
for cl1 in col_l1:
    for rl1 in row_l1:
        idx = pd.MultiIndex.from_tuples([(rl1, r) for r in row_l2])
        cols = pd.MultiIndex.from_tuples([(cl1, c) for c in col_l2])
        data = np.random.randn(2, 2)
        df = pd.DataFrame(data=data, index=idx, columns=cols)
        subframes.append(df)

我当然可以这样做:

combined = pd.concat(subframes, axis=0)

但结果会留下“漏洞”:

                cl1_1               cl1_2          
                cl2_1     cl2_2     cl2_1     cl2_2
rl1_1 rl2_1  1.764052  0.400157       NaN       NaN
      rl2_2  0.978738  2.240893       NaN       NaN
rl1_2 rl2_1  1.867558 -0.977278       NaN       NaN
      rl2_2  0.950088 -0.151357       NaN       NaN
rl1_1 rl2_1       NaN       NaN -0.103219  0.410599
      rl2_2       NaN       NaN  0.144044  1.454274
rl1_2 rl2_1       NaN       NaN  0.761038  0.121675
      rl2_2       NaN       NaN  0.443863  0.333674

我也可以跑:

combined = reduce(lambda x, y: x.combine_first(y), subframes)

非常适合给出:

                cl1_1               cl1_2          
                cl2_1     cl2_2     cl2_1     cl2_2
rl1_1 rl2_1  1.764052  0.400157 -0.103219  0.410599
      rl2_2  0.978738  2.240893  0.144044  1.454274
rl1_2 rl2_1  1.867558 -0.977278  0.761038  0.121675
      rl2_2  0.950088 -0.151357  0.443863  0.333674

这是最适合做这种事情的方法吗?是否有一个更普遍接受的习语或是否有内置的这个?

0 个答案:

没有答案