numpy.argmax比MATLAB慢[〜,idx] = max()?

时间:2015-09-22 01:30:07

标签: python performance matlab numpy pandas

我正在为正态分布编写Bayseian分类器。我在python和MATLAB中都有几乎相同的代码。但是,MATLAB代码的运行速度比我的Python脚本快50倍。我是Python的新手,所以也许我做的事情非常糟糕。我认为这是我循环数据集的地方。

可能numpy.argmax()比[〜,idx] = max()慢得多?循环数据框架很慢?字典的使用不好(之前我尝试过一个对象,它甚至很慢)?

欢迎任何建议。

Python代码

import numpy as np
import pandas as pd

#import the data as a data frame
train_df = pd.read_table('hw1_traindata.txt',header = None)#training
train_df.columns = [1, 2] #rename column titles

这里的数据是2列(300行/样品用于训练,300000用于测试)。这是功能参数; mi和Si是样本均值和协方差。

case3_p = {'w': [], 'w0': [], 'W': []}
case3_p['w']={1:S1.I*m1,2:S2.I*m2,3:S3.I*m3}
case3_p['w0']={1: -1.0/2.0*(m1.T*S1.I*m1)-

1.0/2.0*np.log(np.linalg.det(S1)),
            2: -1.0/2.0*(m2.T*S2.I*m2)-1.0/2.0*np.log(np.linalg.det(S2)),
            3: -1.0/2.0*(m3.T*S3.I*m3)-1.0/2.0*np.log(np.linalg.det(S3))}
case3_p['W']={1: -1.0/2.0*S1.I,
           2: -1.0/2.0*S2.I,
           3: -1.0/2.0*S3.I}
#W1=-1.0/2.0*S1.I
#w1_3=S1.I*m1
#w01_3=-1.0/2.0*(m1.T*S1.I*m1)-1.0/2.0*np.log(np.linalg.det(S1))    
def g3(x,W,w,w0):
    return x.T*W*x+w.T*x+w0

这是分类器/循环

train_df['case3'] = 0

for i in range(train_df.shape[0]):
    x = np.mat(train_df.loc[i,[1, 2]]).T#observation

    #case 3    
    vals = [g3(x,case3_p['W'][1],case3_p['w'][1],case3_p['w0'][1]),
            g3(x,case3_p['W'][2],case3_p['w'][2],case3_p['w0'][2]),
            g3(x,case3_p['W'][3],case3_p['w'][3],case3_p['w0'][3])]
    train_df.loc[i,'case3'] = np.argmax(vals) + 1 #add one to make it the class value

对应的MATLAB代码

train = load('hw1_traindata.txt');

判别函数

W1=-1/2*S1^-1;%there isn't one for the other cases
w1_3=S1^-1*m1';%fix the transpose thing
w10_3=-1/2*(m1*S1^-1*m1')-1/2*log(det(S1));
g1_3=@(x) x'*W1*x+w1_3'*x+w10_3';

W2=-1/2*S2^-1;
w2_3=S2^-1*m2';
w20_3=-1/2*(m2*S2^-1*m2')-1/2*log(det(S2));
g2_3=@(x) x'*W2*x+w2_3'*x+w20_3';

W3=-1/2*S3^-1;
w3_3=S3^-1*m3';
w30_3=-1/2*(m3*S3^-1*m3')-1/2*log(det(S3));
g3_3=@(x) x'*W3*x+w3_3'*x+w30_3';

分类器

case3_class_tr = Inf(size(act_class_tr));
for i=1:length(train)
    x=train(i,:)';%current sample

    %case3
    vals = [g1_3(x),g2_3(x),g3_3(x)];%compute discriminant function value
    [~, case3_class_tr(i)] = max(vals);%get location of max

end

2 个答案:

答案 0 :(得分:5)

在这种情况下,最好对您的代码进行分析。首先,我创建了一些模拟数据:

import numpy as np
import pandas as pd

fname = 'hw1_traindata.txt'
ar = np.random.rand(1000, 2)
np.savetxt(fname, ar, delimiter='\t')

m1, m2, m3 = [np.mat(ar).T for ar in np.random.rand(3, 2)]
S1, S2, S3 = [np.mat(ar) for ar in np.random.rand(3, 2, 2)]

然后我将您的代码包装在一个函数中,并使用lprun(line_profiler)IPython魔术进行分析。结果如下:

%lprun -f train train(fname, m1, S1, m2, S2, m3, S3)
Timer unit: 5.59946e-07 s

Total time: 4.77361 s
File: <ipython-input-164-563f57dadab3>
Function: train at line 1

Line #   Hits     Time  Per Hit  %Time  Line Contents
=====================================================
     1                                 def train(fname, m1, S1, m2, S2, m3, S3):
     2      1     9868   9868.0   0.1      train_df = pd.read_table(fname ,header = None)#training
     3      1      328    328.0   0.0      train_df.columns = [1, 2] #rename column titles
     4                                 
     5      1       17     17.0   0.0      case3_p = {'w': [], 'w0': [], 'W': []}
     6      1      877    877.0   0.0      case3_p['w']={1:S1.I*m1,2:S2.I*m2,3:S3.I*m3}
     7      1      356    356.0   0.0      case3_p['w0']={1: -1.0/2.0*(m1.T*S1.I*m1)-
     8                                 
     9      1      204    204.0   0.0      1.0/2.0*np.log(np.linalg.det(S1)),
    10      1      498    498.0   0.0                  2: -1.0/2.0*(m2.T*S2.I*m2)-1.0/2.0*np.log(np.linalg.det(S2)),
    11      1      502    502.0   0.0                  3: -1.0/2.0*(m3.T*S3.I*m3)-1.0/2.0*np.log(np.linalg.det(S3))}
    12      1      235    235.0   0.0      case3_p['W']={1: -1.0/2.0*S1.I,
    13      1      229    229.0   0.0                 2: -1.0/2.0*S2.I,
    14      1      230    230.0   0.0                 3: -1.0/2.0*S3.I}
    15                                 
    16      1     1818   1818.0   0.0      train_df['case3'] = 0
    17                                 
    18   1001    17409     17.4   0.2      for i in range(train_df.shape[0]):
    19   1000  4254511   4254.5  49.9          x = np.mat(train_df.loc[i,[1, 2]]).T#observation
    20                                 
    21                                         #case 3    
    22   1000   298245    298.2   3.5          vals = [g3(x,case3_p['W'][1],case3_p['w'][1],case3_p['w0'][1]),
    23   1000   269825    269.8   3.2                  g3(x,case3_p['W'][2],case3_p['w'][2],case3_p['w0'][2]),
    24   1000   274279    274.3   3.2                  g3(x,case3_p['W'][3],case3_p['w'][3],case3_p['w0'][3])]
    25   1000  3395654   3395.7  39.8          train_df.loc[i,'case3'] = np.argmax(vals) + 1
    26                                 
    27      1       45     45.0   0.0      return train_df

有两条线共占90%的时间。因此,让我们将这些行分开一点并重新运行探查器:

%lprun -f train train(fname, m1, S1, m2, S2, m3, S3)
Timer unit: 5.59946e-07 s

Total time: 6.15358 s
File: <ipython-input-197-92d9866b57dc>
Function: train at line 1

Line #   Hits      Time  Per Hit  %Time  Line Contents
======================================================
...     
    19   1000   5292988   5293.0   48.2          thing = train_df.loc[i,[1, 2]]  # Observation
    20   1000    265101    265.1    2.4          x = np.mat(thing).T
...     
    26   1000    143142    143.1    1.3          index = np.argmax(vals) + 1  # Add one to make it the class value
    27   1000   4164122   4164.1   37.9          train_df.loc[i,'case3'] = index

大部分时间用于索引Pandas数据帧!取argmax仅占总执行时间的1.5%。

通过预先分配train_df['case3']和使用.iloc,可以稍微改善这种情况:

%lprun -f train train(fname, m1, S1, m2, S2, m3, S3)
Timer unit: 5.59946e-07 s

Total time: 3.26716 s
File: <ipython-input-192-f6173cdf9990>
Function: train at line 1

Line #   Hits      Time  Per Hit  %Time  Line Contents
======= ======= ======================================
    16      1      1548   1548.0    0.0      train_df['case3'] = np.zeros(len(train_df))
...             
    19   1000   2608489   2608.5   44.7          thing = train_df.iloc[i,[0, 1]]  # Observation
    20   1000    228959    229.0    3.9          x = np.mat(thing).T
...             
    26   1000    123165    123.2    2.1          index = np.argmax(vals) + 1  # Add one to make it the class value
    27   1000   1849283   1849.3   31.7          train_df.iloc[i,2] = index

尽管如此,在紧密循环中迭代Pandas数据帧中的各个值是一个坏主意。在这种情况下,只使用Pandas来加载文本数据(它非常擅长)但除此之外使用&#34; raw&#34; Numpy数组。例如。使用train_data = pd.read_table(fname, header=None).values。当你到达分析阶段时,可能会回到Pandas。

其他一些说法:

  • 使用Python基于零的索引编制,不要忘记使用 基于单一的索引。
  • 考虑使用普通的Numpy数组而不是矩阵。当你使用 矩阵你倾向于将它们与数组混合在一起并且难以调试 问题。
  • MATLAB有一个JIT编译器,所以Python和Python之间存在速度差异 MATLAB预计用于循环重码。

答案 1 :(得分:-1)

这真的很难说,但直接从包中获得的Matlab将比Numpy更快。主要是因为它带有自己的Math Kernel Library

50x是否合理近似,很难比较基本的Numpy和Matlab的MKL。

还有其他Python发行版附带自己的MKL,例如EnthoughtAnaconda

在Anaconda的MKL Optimizations页面中,您将看到比较常规Anaconda与MKL之间差异的图表。改进不是线性的,但肯定存在。