Question

鉴于两个pandas数据帧dfa和dfb，如何确保每个DataFrame的MultiIndex包含来自另一个的所有行？

In [147]: dfa
Out[147]: 
        c
a b      
0 5  10.0
1 6  11.0
2 7  12.0
3 8  13.5
4 9  14.0

In [148]: dfb
Out[148]: 
      c
a b    
0 5  10
2 7  12
3 8  13
4 9  14

此处，dfb缺少索引（1,6）：

In [149]: dfa - dfb
Out[149]: 
       c
a b     
0 5  0.0
1 6  NaN
2 7  0.0
3 8  0.5
4 9  0.0

...但dfa也可能缺少dfb的索引。该值应为0，我们在每个数据帧中插入缺少的索引。

换句话说，每个DataFrame的索引应该是两个MultiIndex的并集，其中添加的行的值为0.

Answer 1

如果需要将fill_value替换为某个值，我认为您需要DataFrame.sub参数NaN：

df = dfa.sub(dfb, fill_value=0)
print (df)
        c
a b      
0 5   0.0
1 6  11.0
2 7   0.0
3 8   0.5
4 9   0.0

df = dfb.sub(dfa, fill_value=0)
print (df)
      c
a b    
0 5  10
1 6   0
2 7  12
3 8  13
4 9  14

如果索引需要union，请添加reindex：

mux = dfa.index.union(dfb.index)
print (mux)
MultiIndex(levels=[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]],
           labels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]],
           names=['a', 'b'],
           sortorder=0)

print (dfa.reindex(mux, fill_value=0))
        c
a b      
0 5  10.0
1 6  11.0
2 7  12.0
3 8  13.5
4 9  14.0

print (dfb.reindex(mux, fill_value=0))
      c
a b    
0 5  10
1 6   0
2 7  12
3 8  13
4 9  14

Answer 2

要扩展所有出现的MultiIndex值的笛卡尔乘积，效果很好：

from itertools import product df = dfa.loc[0:2] print(df) c a b 0 5 10.0 1 6 11.0 2 7 12.0 # build full cartesian product index cpr_index = product(*(df.index.get_level_values(icol) for icol in df.index.names)) # and generate the missing elements, filling with -1 print(df.reindex(cpr_index, fill_value=-1)) c a b 0 5 10.0 6 -1.0 7 -1.0 1 5 -1.0 6 11.0 7 -1.0 2 5 -1.0 6 -1.0 7 12.0

基本上，这会创建一个填充了默认值的完整张量或矩阵。对于部分全部人口（例如：对于所有> = 1的人口），必须相应地制作产品。

填写缺少的DataFrame索引

2 个答案: