我想将两个不同大小的pandas数据帧df1和df2相乘。
df1=pd.DataFrame(np.random.randn(6,3),index=list("ABCDEF"),columns=list("XYZ"))
df1
df2=pd.DataFrame(np.random.randn(6,1),index=list("KLMNOP"))
df2
我希望:
df1*df2 # element-wise for all values in df1 and df2
结果将是具有层次结构索引的数据框,列名仍然存在,如下所示。我怎么能这样做?
X Y Z
K A
B
C
D
E
F
L A
B
C
D
E
F
...
P A
B
C
D
E
F
我试过以下
for x in df2:
y=x*df1
print(y)
但在这种情况下,我丢失了index = list(" KLMNOP")的维度
答案 0 :(得分:2)
df = pd.DataFrame(
pd.concat([n * df1 for n in df2.values]).values,
index=pd.MultiIndex.from_product([df2.index, df1.index]))
或者,根据@Wen的keys
方法:
df = pd.concat([n * df1 for n in df2.values], keys=df2.index.values)
结果:
>>> df
X Y Z
K A 0.147213 0.186943 -0.200942
B 0.536710 -0.886668 -0.334171
C 0.501207 0.056768 0.246160
D 0.662405 -0.186932 0.089652
E -0.271139 0.244362 -0.008448
F 0.039946 0.020961 0.161443
L A 0.363300 0.461348 -0.495895
B 1.324524 -2.188169 -0.824685
C 1.236907 0.140096 0.607488
D 1.634719 -0.461321 0.221248
E -0.669131 0.603050 -0.020849
F 0.098581 0.051729 0.398418
M A -0.144267 -0.183202 0.196921
B -0.525970 0.868925 0.327483
C -0.491177 -0.055632 -0.241234
D -0.649149 0.183191 -0.087858
E 0.265713 -0.239472 0.008279
F -0.039147 -0.020542 -0.158212
N A -0.360839 -0.458223 0.492536
B -1.315552 2.173347 0.819098
C -1.228528 -0.139147 -0.603373
D -1.623646 0.458196 -0.219749
E 0.664598 -0.598965 0.020707
F -0.097913 -0.051378 -0.395719
O A 0.313399 0.397980 -0.427782
B 1.142594 -1.887614 -0.711411
C 1.067012 0.120853 0.524047
D 1.410183 -0.397957 0.190858
E -0.577223 0.520218 -0.017985
F 0.085041 0.044624 0.343693
P A -0.594052 -0.754376 0.810867
B -2.165804 3.578000 1.348489
C -2.022537 -0.229078 -0.993339
D -2.673023 0.754333 -0.361775
E 1.094134 -0.986081 0.034091
F -0.161196 -0.084585 -0.651476
答案 1 :(得分:2)
使用,pd.concat
和keys
:)
ldf=[]
for i in df2[0]:
ldf.append(df1*i)
target=pd.concat(ldf,axis=0,keys=df2.index.values)
target
Out[88]:
X Y Z
K A 0.068958 0.962846 0.691092
B -0.262507 0.607219 1.079655
C -0.391440 0.569737 0.365277
D -0.229981 -0.277291 0.859837
E -0.966434 -0.189392 -0.119505
F -0.744944 0.315524 0.101557
L A 0.078607 1.097578 0.787797
B -0.299239 0.692188 1.230732
C -0.446215 0.649461 0.416390
D -0.262162 -0.316093 0.980155
E -1.101669 -0.215894 -0.136228
F -0.849185 0.359675 0.115769
M A 0.043680 0.609898 0.437760
B -0.166280 0.384633 0.683889
C -0.247951 0.360890 0.231378
D -0.145677 -0.175645 0.544649
E -0.612171 -0.119967 -0.075698
F -0.471872 0.199863 0.064330
N A -0.090919 -1.269487 -0.911187
B 0.346108 -0.800603 -1.423497
C 0.516104 -0.751183 -0.481608
D 0.303224 0.365601 -1.133673
E 1.274219 0.249709 0.157564
F 0.982189 -0.416010 -0.133901
O A 0.075041 1.047780 0.752054
B -0.285663 0.660783 1.174892
C -0.425970 0.619994 0.397498
D -0.250268 -0.301751 0.935685
E -1.051685 -0.206099 -0.130047
F -0.810656 0.343356 0.110516
P A -0.025643 -0.358041 -0.256988
B 0.097615 -0.225799 -0.401478
C 0.145560 -0.211861 -0.135831
D 0.085520 0.103113 -0.319737
E 0.359376 0.070427 0.044439
F 0.277013 -0.117330 -0.037765
答案 2 :(得分:1)
使用较小的DF和单线解决方案进行演示:
In [292]: df1
Out[292]:
X Y Z
A 2 1 4
B 0 0 0
C 1 3 2
D 2 0 2
In [293]: df2
Out[293]:
0
K 0
L 4
M 3
N 2
In [299]: pd.DataFrame(np.concatenate(df2.values[:, None] * df1.values),
...: pd.MultiIndex.from_product([df2.index, df1.index]),
...: df1.columns)
...:
Out[299]:
X Y Z
K A 0 0 0
B 0 0 0
C 0 0 0
D 0 0 0
L A 8 4 16
B 0 0 0
C 4 12 8
D 8 0 8
M A 6 3 12
B 0 0 0
C 3 9 6
D 6 0 6
N A 4 2 8
B 0 0 0
C 2 6 4
D 4 0 4
PS确保df2
是Pandas.DataFrame,而不是Pandas.Series。
您可以使用.to_frame()
方法将Series转换为DataFrame:
In [308]: s
Out[308]:
K 0
L 4
M 3
N 2
Name: 0, dtype: int32
In [310]: s = s.to_frame()
In [311]: s
Out[311]:
0
K 0
L 4
M 3
N 2