我有两个大型矩阵(超过数千行),我必须计算它们的“sumproduct()”。
矩阵是这样的:
import pandas as pd
d1 = {'item':['A','B','C'], 't1':[10,5,10], 't2':[10,10,5], 't3': [100, 0, 0], 't4':[0,100,100]}
data_frame1 = pd.DataFrame(data = d1)
item t1 t2 t3 t4
A 10 10 100 0
B 5 10 0 100
C 10 5 0 100
d2 = {'scenario':[1,1,1,2,2,2], 'step':[1,2,3,1,2,3], 't1':[0.97,0.98,0.99,0.972,0.979,0.991], 't2':[0.960,0.964,0.964,0.972,0.977,0.985], 't3': [0.950,0.956,0.965,0.967,0.970,0.980], 't4':[0.945,0.951,0.955,0.962,0.964,0.973]}
data_frame2 = pd.DataFrame(data = d2)
scenario step t1 t2 t3 t4
1 1 0.97 0.960 0.950 0.945
1 2 0.98 0.964 0.956 0.951
1 3 0.99 0.964 0.965 0.955
2 1 0.972 0.972 0.967 0.962
2 2 0.979 0.977 0.970 0.964
2 3 0.991 0.985 0.980 0.973
预期输出为(结果是项目,方案和步骤的每个组合在t1到t4之间的产品):
item scenario step result
A 1 1 114.30
A 1 2 115.01
A 1 3 116.07
A 2 1 116.16
A 2 2 116.59
A 2 3 117.75
B 1 1 108.95
B 1 2 109.69
B 1 3 110.09
B 2 1 110.77
B 2 2 111.06
B 2 3 112.14
C 1 1 109.00
C 1 2 109.77
C 1 3 110.22
C 2 1 110.77
C 2 2 111.07
C 2 3 112.17
有这样做的pythonic方法吗? (我尝试使用for和循环,但这需要太长时间)
答案 0 :(得分:4)
您可以使用dot
+ melt
:
df1 = df1.set_index('item')
df1
t1 t2 t3 t4
item
A 10 10 100 0
B 5 10 0 100
C 10 5 0 100
df2 = df2.set_index(['scenario', 'step'])
df2
t1 t2 t3 t4
scenario step
1 1 0.970 0.960 0.950 0.945
2 0.980 0.964 0.956 0.951
3 0.990 0.964 0.965 0.955
2 1 0.972 0.972 0.967 0.962
2 0.979 0.977 0.970 0.964
3 0.991 0.985 0.980 0.973
df1.dot(df2.T).reset_index()\
.melt('item', value_name='result'))\
.sort_values(['item', 'scenario', 'step'])
item scenario step result
0 A 1 1 114.300
3 A 1 2 115.040
6 A 1 3 116.040
9 A 2 1 116.140
12 A 2 2 116.560
15 A 2 3 117.760
1 B 1 1 108.950
4 B 1 2 109.640
7 B 1 3 110.090
10 B 2 1 110.780
13 B 2 2 111.065
16 B 2 3 112.105
2 C 1 1 109.000
5 C 1 2 109.720
8 C 1 3 110.220
11 C 2 1 110.780
14 C 2 2 111.075
17 C 2 3 112.135
答案 1 :(得分:1)
我们可以使用np.sum
和multiply
重建您的预期输出。
a=[np.sum(np.multiply(data_frame2.iloc[:,2:].values,value),1) for value in data_frame1.iloc[:,1:].values]
d3 = {'scenario':[1,1,1,2,2,2]*len(data_frame1), 'step':[1,2,3,1,2,3]*len(data_frame1),'item':np.repeat(['A','B','C'],len(data_frame2)),'result':np.concatenate(a)}
df3=pd.DataFrame(d3)
df3
Out[678]:
item result scenario step
0 A 114.300 1 1
1 A 115.040 1 2
2 A 116.040 1 3
3 A 116.140 2 1
4 A 116.560 2 2
5 A 117.760 2 3
6 B 108.950 1 1
7 B 109.640 1 2
8 B 110.090 1 3
9 B 110.780 2 1
10 B 111.065 2 2
11 B 112.105 2 3
12 C 109.000 1 1
13 C 109.720 1 2
14 C 110.220 1 3
15 C 110.780 2 1
16 C 111.075 2 2
17 C 112.135 2 3