两个不同大小的pandas数据帧的乘法元素,并添加一个维度作为附加索引/列

时间:2017-09-16 22:26:17

标签: python pandas dataframe

我想将两个不同大小的pandas数据帧df1和df2相乘。

df1=pd.DataFrame(np.random.randn(6,3),index=list("ABCDEF"),columns=list("XYZ"))
df1
df2=pd.DataFrame(np.random.randn(6,1),index=list("KLMNOP"))
df2

我希望:

df1*df2 # element-wise for all values in df1 and df2

结果将是具有层次结构索引的数据框,列名仍然存在,如下所示。我怎么能这样做?

    X  Y  Z
K A
  B
  C
  D
  E
  F
L A
  B
  C
  D
  E
  F
  ...
P A
  B
  C
  D
  E
  F

我试过以下

for x in df2:
 y=x*df1
 print(y)

但在这种情况下,我丢失了index = list(" KLMNOP")的维度

3 个答案:

答案 0 :(得分:2)

df = pd.DataFrame(
    pd.concat([n * df1 for n in df2.values]).values,
    index=pd.MultiIndex.from_product([df2.index, df1.index]))

或者,根据@Wen的keys方法:

df = pd.concat([n * df1 for n in df2.values], keys=df2.index.values)

结果:

>>> df
            X         Y         Z
K A  0.147213  0.186943 -0.200942
  B  0.536710 -0.886668 -0.334171
  C  0.501207  0.056768  0.246160
  D  0.662405 -0.186932  0.089652
  E -0.271139  0.244362 -0.008448
  F  0.039946  0.020961  0.161443
L A  0.363300  0.461348 -0.495895
  B  1.324524 -2.188169 -0.824685
  C  1.236907  0.140096  0.607488
  D  1.634719 -0.461321  0.221248
  E -0.669131  0.603050 -0.020849
  F  0.098581  0.051729  0.398418
M A -0.144267 -0.183202  0.196921
  B -0.525970  0.868925  0.327483
  C -0.491177 -0.055632 -0.241234
  D -0.649149  0.183191 -0.087858
  E  0.265713 -0.239472  0.008279
  F -0.039147 -0.020542 -0.158212
N A -0.360839 -0.458223  0.492536
  B -1.315552  2.173347  0.819098
  C -1.228528 -0.139147 -0.603373
  D -1.623646  0.458196 -0.219749
  E  0.664598 -0.598965  0.020707
  F -0.097913 -0.051378 -0.395719
O A  0.313399  0.397980 -0.427782
  B  1.142594 -1.887614 -0.711411
  C  1.067012  0.120853  0.524047
  D  1.410183 -0.397957  0.190858
  E -0.577223  0.520218 -0.017985
  F  0.085041  0.044624  0.343693
P A -0.594052 -0.754376  0.810867
  B -2.165804  3.578000  1.348489
  C -2.022537 -0.229078 -0.993339
  D -2.673023  0.754333 -0.361775
  E  1.094134 -0.986081  0.034091
  F -0.161196 -0.084585 -0.651476

答案 1 :(得分:2)

使用,pd.concatkeys:)

ldf=[]
for i in df2[0]:
    ldf.append(df1*i)
target=pd.concat(ldf,axis=0,keys=df2.index.values)


target
Out[88]: 
            X         Y         Z
K A  0.068958  0.962846  0.691092
  B -0.262507  0.607219  1.079655
  C -0.391440  0.569737  0.365277
  D -0.229981 -0.277291  0.859837
  E -0.966434 -0.189392 -0.119505
  F -0.744944  0.315524  0.101557
L A  0.078607  1.097578  0.787797
  B -0.299239  0.692188  1.230732
  C -0.446215  0.649461  0.416390
  D -0.262162 -0.316093  0.980155
  E -1.101669 -0.215894 -0.136228
  F -0.849185  0.359675  0.115769
M A  0.043680  0.609898  0.437760
  B -0.166280  0.384633  0.683889
  C -0.247951  0.360890  0.231378
  D -0.145677 -0.175645  0.544649
  E -0.612171 -0.119967 -0.075698
  F -0.471872  0.199863  0.064330
N A -0.090919 -1.269487 -0.911187
  B  0.346108 -0.800603 -1.423497
  C  0.516104 -0.751183 -0.481608
  D  0.303224  0.365601 -1.133673
  E  1.274219  0.249709  0.157564
  F  0.982189 -0.416010 -0.133901
O A  0.075041  1.047780  0.752054
  B -0.285663  0.660783  1.174892
  C -0.425970  0.619994  0.397498
  D -0.250268 -0.301751  0.935685
  E -1.051685 -0.206099 -0.130047
  F -0.810656  0.343356  0.110516
P A -0.025643 -0.358041 -0.256988
  B  0.097615 -0.225799 -0.401478
  C  0.145560 -0.211861 -0.135831
  D  0.085520  0.103113 -0.319737
  E  0.359376  0.070427  0.044439
  F  0.277013 -0.117330 -0.037765

答案 2 :(得分:1)

使用较小的DF和单线解决方案进行演示:

In [292]: df1
Out[292]:
   X  Y  Z
A  2  1  4
B  0  0  0
C  1  3  2
D  2  0  2

In [293]: df2
Out[293]:
   0
K  0
L  4
M  3
N  2

In [299]: pd.DataFrame(np.concatenate(df2.values[:, None] * df1.values),
     ...:              pd.MultiIndex.from_product([df2.index, df1.index]),
     ...:              df1.columns)
     ...:
Out[299]:
     X   Y   Z
K A  0   0   0
  B  0   0   0
  C  0   0   0
  D  0   0   0
L A  8   4  16
  B  0   0   0
  C  4  12   8
  D  8   0   8
M A  6   3  12
  B  0   0   0
  C  3   9   6
  D  6   0   6
N A  4   2   8
  B  0   0   0
  C  2   6   4
  D  4   0   4

PS确保df2是Pandas.DataFrame,而不是Pandas.Series。

您可以使用.to_frame()方法将Series转换为DataFrame:

In [308]: s
Out[308]:
K    0
L    4
M    3
N    2
Name: 0, dtype: int32

In [310]: s = s.to_frame()

In [311]: s
Out[311]:
   0
K  0
L  4
M  3
N  2