我编写了一个脚本,在MATLAB中训练一维Kohonen网络,它就像一个魅力。然后我尝试将其翻译为 Python 2.7 ,这是一种我非常新的语言,脚本将永远运行。
我会解释我在做什么,看看这里是否有人可以对此事进行一些说明。我在矩阵y
中有一个给定的数据集,我想用它训练不同的SOM。 SOM是一维的(一条线),它的神经元数量也各不相同。我一开始训练一个大小为N=2
的SOM,最后训练N=NMax
,总共NMax-2+1
个SOM。对于每个SOM,我想在训练结束后存储权重,然后再转到下一个SOM。
在MATLAB中,对于NMax = 5
和iterMax = 50
,需要9.74秒。在Python中,54.04秒。这种差异是巨大的,实际数据集,SOM数量和迭代次数更多,因此Python代码需要永远结束。
我目前的代码如下:
import numpy as np
import time
y = np.random.rand(2500,3) # Create random dataset to test
def A(d,s): # Neighborhood function
return np.exp(-d**2 / (2*s**2))
sigma_0 = float(5) # Initial standard deviation for A
eta_0 = float(1) # Initial learning rate
iterMax = 250 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
w = range(NMax - 1) # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
#%% KOHONEN 1D
t = time.time() # Start time
for N in np.arange(2,NMax + 1): # Size of the network
w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
iterCount = 1; # Iteration counter
while iterCount < iterMax:
# Mix the datapoints to choose them in random order
mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
# Decrease the value of the variance and the learning rate
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
selectedInput = mixInputs[kk,:]
# These two lines calculate the weight that is the nearest to the datapoint selected
aux = np.absolute(np.array(np.kron(np.ones((N,1)),selectedInput)) - np.array(w[N - 2]))
aux = np.sum(np.abs(aux)**2,axis=-1)
ii = np.argmin(aux) # The node ii is the winner
for jj in range(N):
dist = min(np.absolute(ii-jj) , np.absolute(np.absolute(ii-jj)-N)) # Centering the neighborhood function in the winner
w[N - 2][jj,:] = w[N - 2][jj,:] + eta * A(dist,sigma) * (selectedInput - w[N - 2][jj,:]) # Updating the weights
print(N,iterCount)
iterCount = iterCount + 1
elapsedTime = time.time() - t
在MATLAB中,每次迭代(每次变量iterCount
增加1)几乎是即时的。在Python中,每个都需要很长时间。我不知道他们为什么表现如此不同,但我想看看是否有可能加速Python版本。有什么建议吗?
编辑:根据评论中的要求,这里是代码的更快的MATLAB版本。
y = rand(2500,3) % Random dataset
A = @(d,s) exp(-d^2 / (2*s^2));
sigma_0 = 5;
eta_0 = 1;
iterMax = 250;
NMax = 10;
w = cell(NMax - 1,1);
%% KOHONEN 1D
tic();
for N = 2 : NMax
w{N - 1} = rand(N,size(y,2)) - 0.5;
iterCount = 1;
while (iterCount < iterMax)
mixInputs = y(randperm(size(y,1)),:);
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount;
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount;
for kk = 1 : size(mixInputs,1)
input = mixInputs(kk,:);
% [~,ii] = min(pdist2(input,w{N - 1}));
aux = abs(repmat(input,N,1) - w{N - 1});
[~,ii] = min((sum(aux.^2,2)));
for jj = 1 : N
dist = min(abs(ii-jj) , abs(abs(ii-jj)-N));
w{N - 1}(jj,:) = w{N - 1}(jj,:) + eta * A(dist,sigma) * (input - w{N - 1}(jj,:));
end
end
N % Show N
iterCount = iterCount + 1 % Show iterCount
end
end
toc();
答案 0 :(得分:2)
使用profiling module查找正在调用的函数以及它们需要多长时间。
在下面的输出中,列具有以下含义:
ncalls 对于通话次数,
tottime 在给定函数中花费的总时间(并且不包括调用子函数的时间)
percall 是tottime除以ncalls
的商cumtime 是在这个和所有子功能(从调用到退出)中花费的累积时间。即使对于递归函数,这个数字也是准确的。
percall 是cumtime除以原始调用的商数
文件名:LINENO(功能) 提供每个功能的相应数据
看起来你正在多次调用A()
......经常使用相同的值。
python2.7 -m cProfile -s tottime ${YOUR_SCRIPT}
5481855 function calls (5481734 primitive calls) in 4.986 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.572 1.572 4.986 4.986 x.py:1(<module>)
214500 0.533 0.000 0.533 0.000 x.py:8(A)
107251 0.462 0.000 1.986 0.000 shape_base.py:686(kron)
107251 0.345 0.000 0.456 0.000 numeric.py:1015(outer)
214502 0.266 0.000 0.563 0.000 {sorted}
...
尝试缓存值:
A_vals = {}
def A(d,s): # Neighborhood function
t = (d,s)
if t in A_vals:
return A_vals[t]
ret = np.exp(-d**2 / (2*s**2))
A_vals[t] = ret
return ret
现在我们看到:
6206113 function calls (6205992 primitive calls) in 4.986 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.727 1.727 4.986 4.986 x.py:1(<module>)
121451 0.491 0.000 2.180 0.000 shape_base.py:686(kron)
121451 0.371 0.000 0.496 0.000 numeric.py:1015(outer)
242902 0.293 0.000 0.621 0.000 {sorted}
121451 0.265 0.000 0.265 0.000 {method 'reduce' of 'numpy.ufunc' objects}
...
242900 0.091 0.000 0.091 0.000 x.py:7(A)
...
此时它变成了一个简单的优化练习!...
您的列表中的下一个是kron()
- 享受!
您还会发现将脚本分解为更小的函数是有用的(从样式的角度来看和分析)。我完全按照分析原因完成了以下操作 - 因此您最好使用合理的名称,并可能进行更好的分割。
import numpy as np
import time
y = np.random.rand(2500,3) # Create random dataset to test
A_vals = {}
def A(d,s): # Neighborhood function
t = (d,s)
if t in A_vals:
return A_vals[t]
ret = np.exp(-d**2 / (2*s**2))
A_vals[t] = ret
return ret
def a():
sigma_0 = float(5) # Initial standard deviation for A
eta_0 = float(1) # Initial learning rate
iterMax = 250 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
w = range(NMax - 1) # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
#%% KOHONEN 1D
t = time.time() # Start time
for N in np.arange(2,NMax + 1): # Size of the network
b(w, N, sigma_0, eta_0, iterMax)
elapsedTime = time.time() - t
def b(w, N, sigma_0, eta_0, iterMax):
w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
for iterCount in range(1, iterMax):
c(N, sigma_0, eta_0, iterMax, iterCount, w)
def c(N, sigma_0, eta_0, iterMax, iterCount, w):
# Mix the datapoints to choose them in random order
mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
# Decrease the value of the variance and the learning rate
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
d(N, w, mixInputs, sigma, eta, kk)
print(N,iterCount)
def d(N, w, mixInputs, sigma, eta, kk):
selectedInput = mixInputs[kk,:]
# These two lines calculate the weight that is the nearest to the datapoint selected
aux = np.absolute(np.array(np.kron(np.ones((N,1)),selectedInput)) - np.array(w[N - 2]))
aux = np.sum(np.abs(aux)**2,axis=-1)
ii = np.argmin(aux) # The node ii is the winner
for jj in range(N):
e(N, w, sigma, eta, selectedInput, ii, jj)
def e(N, w, sigma, eta, selectedInput, ii, jj):
dist = min(np.absolute(ii-jj) , np.absolute(np.absolute(ii-jj)-N)) # Centering the neighborhood function in the winner
f(N, w, sigma, eta, selectedInput, jj, dist)
def f(N, w, sigma, eta, selectedInput, jj, dist):
w[N - 2][jj,:] = w[N - 2][jj,:] + eta * A(dist,sigma) * (selectedInput - w[N - 2][jj,:]) # Updating the weights
a()
6701974 function calls (6701853 primitive calls) in 4.985 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
238921 0.691 0.000 0.777 0.000 x.py:56(f)
119461 0.613 0.000 4.923 0.000 x.py:43(d)
119461 0.498 0.000 2.144 0.000 shape_base.py:686(kron)
238921 0.462 0.000 1.280 0.000 x.py:52(e)
119461 0.369 0.000 0.495 0.000 numeric.py:1015(outer)
这标志着f()
是最大的时间。
答案 1 :(得分:1)
这里快速刺激一些加速 - 我认为输出是相同的,但肯定需要一些时间来仔细检查它:
import numpy as np
import time
np.random.seed(1234)
y = np.random.rand(2500,3) # Create random dataset to test
sigma_0 = float(5) # Initial standard deviation for A
eta_0 = float(1) # Initial learning rate
iterMax = 10 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
w = {} # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
#%% KOHONEN 1D
t = time.time() # Start time
for N in np.arange(2,NMax + 1): # Size of the network
w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
iterCount = 1; # Iteration counter
while iterCount < iterMax:
# Mix the datapoints to choose them in random order
mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
# Decrease the value of the variance and the learning rate
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
s2 = 2*sigma**2
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
selectedInput = mixInputs[kk,:]
# These two lines calculate the weight that is the nearest to the datapoint selected
aux = np.sum((selectedInput - np.array(w[N - 2]))**2, axis = -1)
ii = np.argmin(aux)
jjs = np.abs(ii - np.arange(N))
dists = np.min(np.vstack([jjs , np.absolute(jjs-N)]), axis = 0)
w[N - 2] = w[N - 2] + eta * np.exp((-dists**2)/s2).T[:,np.newaxis] * (selectedInput - w[N - 2]) # Updating the weights
print(N,iterCount)
iterCount = iterCount + 1
elapsedTime = time.time() - t
关键加速主要是使用广播来减少循环/函数调用。
我们可以替换这一行:
aux = np.absolute(np.array(np.kron(np.ones((N,1)),selectedInput)) - np.array(w[N - 2]))
使用:
aux = np.abs(selectedInput - np.array(w[N - 2]))
(我已经将其进一步塞进了接下来的几个步骤中)。 Numpy广播给了我们相同的结果,而不必采用kron产品。
举个例子:
np.kron(np.ones((3,1)), np.array([6,5,4])) - np.arange(-9,0).reshape(3,3)
与<:p>的输出相同
np.array([6,5,4]) - np.arange(-9,0).reshape(3,3)
kron(np.ones(N,1),x)
给出一个N * x.shape[0]
数组,其中包含N个x副本。
广播以更便宜的方式处理这个问题。
另一个主要加速是减少:
for jj in range(N):
进行矩阵运算。我们计算每个循环2*sigma**2
一次,用本地numpy调用替换A函数,并对其余循环进行矢量化。