我尝试添加两个具有Multiindex Columns和不同索引大小的数据帧。什么是最优雅的解决方案。例如:
names = ['Level 0', 'Level 1']
cols1 = pd.MultiIndex.from_arrays([['A', 'A', 'B'],['A1', 'A2', 'B1']], names = names)
cols2 = pd.MultiIndex.from_arrays([['A', 'A', 'B'],['A1', 'A3', 'B1']], names = names)
df1 = pd.DataFrame(np.random.randn(1, 3), index=range(1), columns=cols1)
df2 = pd.DataFrame(np.random.randn(5, 3), index=range(5), columns=cols2)
print(df1)
print(df2)
Level 0 A B
Level 1 A1 A2 B1
0 -0.116975 -0.391591 0.446029
Level 0 A B
Level 1 A1 A3 B1
0 1.179689 0.693096 -0.102621
1 -0.913441 0.187332 1.465217
2 -0.089724 -1.907706 -0.963699
3 0.203217 -1.233399 0.006726
4 0.218911 -0.027446 0.982764
现在我尝试将df1添加到df2,其逻辑是刚刚添加了缺失列,并且df1的索引0被添加到df2中的所有索引。
所以我期望上面的数字:
Level 0 A B
Level 1 A1 A2 A3 B1
0 1.062714 -0.391591 0.693096 0.343408
1 -1.030416 -0.391591 0.187332 1.911246
2 -0.206699 -0.391591 -1.907706 -0.51767
3 0.086242 -0.391591 -1.233399 0.452755
4 0.101936 -0.391591 -0.027446 1.428793
速度和内存效率最高的解决方案是什么?任何帮助表示赞赏。
答案 0 :(得分:5)
设置
In [76]: df1
Out[76]:
Level 0 A B
Level 1 A1 A2 B1
0 -0.28667 1.852091 -0.134793
In [77]: df2
Out[77]:
Level 0 A B
Level 1 A1 A3 B1
0 -0.023582 -0.713594 0.487355
1 0.628819 0.764721 -1.118777
2 -0.572421 1.326448 -0.788531
3 -0.160608 1.985142 0.344845
4 -0.184555 -1.075794 0.630975
这将对齐帧并将nan填充为0 但不是广播
In [63]: df1a,df2a = df1.align(df2,fill_value=0)
In [64]: df1a+df2a
Out[64]:
Level 0 A B
Level 1 A1 A2 A3 B1
0 -0.310253 1.852091 -0.713594 0.352561
1 0.628819 0.000000 0.764721 -1.118777
2 -0.572421 0.000000 1.326448 -0.788531
3 -0.160608 0.000000 1.985142 0.344845
4 -0.184555 0.000000 -1.075794 0.630975
这是广播第一个
的方式In [65]: df1a,df2a = df1.align(df2)
In [66]: df1a.ffill().fillna(0) + df2a.fillna(0)
Out[66]:
Level 0 A B
Level 1 A1 A2 A3 B1
0 -0.310253 1.852091 -0.713594 0.352561
1 0.342149 1.852091 0.764721 -1.253570
2 -0.859091 1.852091 1.326448 -0.923324
3 -0.447278 1.852091 1.985142 0.210052
4 -0.471226 1.852091 -1.075794 0.496181