Pythonic做大型矩阵SumProduct的方法?

时间:2017-11-27 00:55:24

标签: python pandas

我有两个大型矩阵(超过数千行),我必须计算它们的“sumproduct()”。

矩阵是这样的:

import pandas as pd

d1 = {'item':['A','B','C'], 't1':[10,5,10], 't2':[10,10,5], 't3': [100, 0, 0], 't4':[0,100,100]}
data_frame1 = pd.DataFrame(data = d1)

item t1 t2  t3  t4
A   10  10  100 0
B   5   10  0   100
C   10  5   0   100



d2 = {'scenario':[1,1,1,2,2,2], 'step':[1,2,3,1,2,3], 't1':[0.97,0.98,0.99,0.972,0.979,0.991], 't2':[0.960,0.964,0.964,0.972,0.977,0.985], 't3': [0.950,0.956,0.965,0.967,0.970,0.980], 't4':[0.945,0.951,0.955,0.962,0.964,0.973]}
data_frame2 = pd.DataFrame(data = d2)

scenario step   t1  t2  t3  t4
1   1   0.97    0.960   0.950   0.945
1   2   0.98    0.964   0.956   0.951
1   3   0.99    0.964   0.965   0.955
2   1   0.972   0.972   0.967   0.962
2   2   0.979   0.977   0.970   0.964
2   3   0.991   0.985   0.980   0.973

预期输出为(结果是项目,方案和步骤的每个组合在t1到t4之间的产品):

item scenario step result
A   1   1   114.30
A   1   2   115.01
A   1   3   116.07
A   2   1   116.16
A   2   2   116.59
A   2   3   117.75
B   1   1   108.95
B   1   2   109.69
B   1   3   110.09
B   2   1   110.77
B   2   2   111.06
B   2   3   112.14
C   1   1   109.00
C   1   2   109.77
C   1   3   110.22
C   2   1   110.77
C   2   2   111.07
C   2   3   112.17

有这样做的pythonic方法吗? (我尝试使用for和循环,但这需要太长时间)

2 个答案:

答案 0 :(得分:4)

您可以使用dot + melt

执行此操作
df1 = df1.set_index('item')
df1

      t1  t2   t3   t4
item                  
A     10  10  100    0
B      5  10    0  100
C     10   5    0  100

df2 = df2.set_index(['scenario', 'step'])
df2

                  t1     t2     t3     t4
scenario step                            
1        1     0.970  0.960  0.950  0.945
         2     0.980  0.964  0.956  0.951
         3     0.990  0.964  0.965  0.955
2        1     0.972  0.972  0.967  0.962
         2     0.979  0.977  0.970  0.964
         3     0.991  0.985  0.980  0.973
df1.dot(df2.T).reset_index()\
   .melt('item', value_name='result'))\
   .sort_values(['item', 'scenario', 'step'])

   item scenario step   result
0     A        1    1  114.300
3     A        1    2  115.040
6     A        1    3  116.040
9     A        2    1  116.140
12    A        2    2  116.560
15    A        2    3  117.760
1     B        1    1  108.950
4     B        1    2  109.640
7     B        1    3  110.090
10    B        2    1  110.780
13    B        2    2  111.065
16    B        2    3  112.105
2     C        1    1  109.000
5     C        1    2  109.720
8     C        1    3  110.220
11    C        2    1  110.780
14    C        2    2  111.075
17    C        2    3  112.135

答案 1 :(得分:1)

我们可以使用np.summultiply重建您的预期输出。

a=[np.sum(np.multiply(data_frame2.iloc[:,2:].values,value),1) for value in data_frame1.iloc[:,1:].values]
d3 = {'scenario':[1,1,1,2,2,2]*len(data_frame1), 'step':[1,2,3,1,2,3]*len(data_frame1),'item':np.repeat(['A','B','C'],len(data_frame2)),'result':np.concatenate(a)}
df3=pd.DataFrame(d3)


df3
Out[678]: 
   item   result  scenario  step
0     A  114.300         1     1
1     A  115.040         1     2
2     A  116.040         1     3
3     A  116.140         2     1
4     A  116.560         2     2
5     A  117.760         2     3
6     B  108.950         1     1
7     B  109.640         1     2
8     B  110.090         1     3
9     B  110.780         2     1
10    B  111.065         2     2
11    B  112.105         2     3
12    C  109.000         1     1
13    C  109.720         1     2
14    C  110.220         1     3
15    C  110.780         2     1
16    C  111.075         2     2
17    C  112.135         2     3