Mul()来自Multi Index的广播级别

时间:2015-07-15 16:08:46

标签: pandas

尝试使用具有多索引的乘法运算。

import pandas as pd
import numpy as np

d = {'Alpha': [1,2,3,4,5,6,7,8,9]
   ,'Beta':tuple('ABCDEFGHI')
   ,'C': np.random.randint(1,10,9)
   ,'D': np.random.randint(100,200,9)
 }

df = pd.DataFrame(d)
df.set_index(['Alpha','Beta'],inplace=True)
df = df.stack() #it's now a series
df.index.names = df.index.names[:-1] + ['Gamma']

ser = pd.Series(data = np.random.rand(9))
ser.index = pd.MultiIndex.from_tuples(zip(range(1,10),np.repeat('C',9)))
ser.index.names = ['Alpha','Gamma']

print df
print ser

foo = df.mul(ser,axis=0,level = ['Alpha','Gamma'])

所以我的数据框成为一个系列看起来像

Alpha  Beta  Gamma
1      A     C          7
             D        188
2      B     C          7
             D        110
3      C     C          2
             D        124
4      D     C          4
             D        153
5      E     C          9
             D        178
6      F     C          6
             D        196
7      G     C          1
             D        156
8      H     C          1
             D        184
9      I     C          3
             D        169

我的系列看起来像

Alpha  Gamma
1      C       0.8731
2      C       0.6347
3      C       0.4688
4      C       0.5623
5      C       0.4944
6      C       0.5234
7      C       0.9946
8      C       0.7815
9      C       0.1219

在我的multiply操作中,我想在索引级别'Alpha''Gamma'上广播

但我收到此错误消息:

  

TypeError:两个MultiIndex对象之间的连接是不明确的

3 个答案:

答案 0 :(得分:2)

这个怎么样?也许这是df中额外的“Beta”列,但不会导致问题?

(注意:这是使用df在@Dickster的答案中更新,而不是在原始问题中)

df2 = df.reset_index().set_index(['Alpha','Gamma'])

df2[0].mul(ser)

Alpha  Gamma
1      C          2.503829
       D               NaN
2      C          5.028208
       D               NaN
3      C          0.842322
       D               NaN
4      C          0.198101
       D               NaN
5      C          0.800745
       D               NaN
6      C          1.936523
       D               NaN
7      C          2.507393
       D               NaN
8      C          4.846258
       D               NaN
9      C               NaN
       D        147.233378

答案 1 :(得分:1)

想象一下,我有这个,我现在有一个' D'在Gamma中的系列" ser":

import pandas as pd
import numpy as np

np.random.seed(1)
d = {'Alpha': [1,2,3,4,5,6,7,8,9]
   ,'Beta':tuple('ABCDEFGHI')
   ,'C': np.random.randint(1,10,9)
   ,'D': np.random.randint(100,200,9)
 }

df = pd.DataFrame(d)
df.set_index(['Alpha','Beta'],inplace=True)
df = df.stack() #it's now a series
df.index.names = df.index.names[:-1] + ['Gamma']

ser = pd.Series(data = np.random.rand(9))


idx = list(np.repeat('C',8))
idx.append('D')

ser.index = pd.MultiIndex.from_tuples(zip(range(1,10),idx))
ser.index.names = ['Alpha','Gamma']

print df
print ser

df_A = df.unstack('Alpha').mul(ser).stack('Alpha').reorder_levels(df.index.names)
print df_A


df_dickster77 = df.unstack('Alpha').mul(ser.unstack('Alpha')).stack('Alpha').reorder_levels(df.index.names)
print df_dickster77 

输出是这样的:

Alpha  Beta  Gamma
1      A     C          6
             D        120
2      B     C          9
             D        118
3      C     C          6
             D        184
4      D     C          1
             D        111
5      E     C          1
             D        128
6      F     C          2
             D        129
7      G     C          8
             D        114
8      H     C          7
             D        150
9      I     C          3
             D        168
dtype: int32
Alpha  Gamma
1      C        0.417305
2      C        0.558690
3      C        0.140387
4      C        0.198101
5      C        0.800745
6      C        0.968262
7      C        0.313424
8      C        0.692323
9      D        0.876389
dtype: float64

输出A:无意的乘法

Gamma                      C           D
Alpha Beta Gamma                        
1     A    C        2.503829         NaN
           D       50.076576         NaN
2     B    C        5.028208         NaN
           D       65.925400         NaN
3     C    C        0.842322         NaN
           D       25.831197         NaN
4     D    C        0.198101         NaN
           D       21.989265         NaN
5     E    C        0.800745         NaN
           D      102.495305         NaN
6     F    C        1.936523         NaN
           D      124.905743         NaN
7     G    C        2.507393         NaN
           D       35.730356         NaN
8     H    C        4.846258         NaN
           D      103.848392         NaN
9     I    C             NaN    2.629167
           D             NaN  147.233378

输出df_dickster77:正确的乘法排列在C&D和D上。 然而,8 x D NaNs损失,1 x C NaN损失

Alpha  Beta  Gamma
1      A     C          2.503829
2      B     C          5.028208
3      C     C          0.842322
4      D     C          0.198101
5      E     C          0.800745
6      F     C          1.936523
7      G     C          2.507393
8      H     C          4.846258
9      I     D        147.233378
dtype: float64

答案 2 :(得分:0)

这是ATM的方法。在某些时候,可以实现更简洁。

In [21]: df.unstack('Alpha').mul(ser).stack('Alpha').reorder_levels(df.index.names)
Out[21]: 
Gamma                      C
Alpha Beta Gamma            
1     A    C        6.761867
           D      171.944612
2     B    C        0.154139
           D        6.371062
3     C    C        2.311870
           D       42.898041
4     D    C        0.390920
           D        9.479801
5     E    C        3.484439
           D       72.011743
6     F    C        0.740913
           D       50.382061
7     G    C        3.459497
           D       60.541203
8     H    C        0.467012
           D       19.030741
9     I    C        0.071290
           D       11.620286