如何在使用Python neurolab时避免MemoryError?

时间:2015-11-15 20:25:22

标签: python neural-network

如何解决此 MemoryError 问题?

我在train3.csv

中有642,709行

.train() 来电失败。

我有4GB的DDR3内存。

有没有办法让MemoryError像{}}}那样在其他培训方法上失败或者以某种方式增加我的虚拟内存(我在Windows 10上)?

代码:

train_file   = 'train3.csv'
netsave_file = 'neurolab.net'
hidden_units = 440
outputs = 1

import numpy    as np
import neurolab as nl

# read training data and put it into numpy array _______________________
t = []
t_file = open(train_file, 'r')
for line in t_file.readlines():
    train = line.split(',')
    train[1] = int(train[1])
    for i in range(0,72):
        train[i+2] = float(train[i+2])               # convert to floats
    t.append(train)
t_file.close()

print "training samples read: " + str(len(t))

input = []
target = []
for train in t:
    input.append(train[2:2+72])
    target.append(train[1:2])
print "done reading input and target"
train = 0

input = np.array(input)
target = np.array(target)
print "done converting input and target to numpy array"

net = nl.net.newff([[0.0,1.0]]*72, [hidden_units,144,outputs])

# Train process _______________________________________________________
err = net.train(input, target, show=1, epochs = 2)

net.save(netsave_file)

显示此错误:

Traceback (most recent call last):
  File "neurolab_train.py", line 43, in <module>
    err = net.train(input, target, show=1, epochs = 2)
  File "C:\Users\tintran\Anaconda\lib\site-packages\neurolab\core.py", line 165, in train
    return self.trainf(self, *args, **kwargs)
  File "C:\Users\tintran\Anaconda\lib\site-packages\neurolab\core.py", line 349, in __call__
    train(net, *args)
  File "C:\Users\tintran\Anaconda\lib\site-packages\neurolab\train\spo.py", line 79, in __call__
    **self.kwargs)
  File "C:\Users\tintran\Anaconda\lib\site-packages\scipy\optimize\optimize.py", line 782, in fmin_bfgs
    res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)
  File "C:\Users\tintran\Anaconda\lib\site-packages\scipy\optimize\optimize.py", line 840, in _minimize_bfgs
    I = numpy.eye(N, dtype=int)
  File "C:\Users\tintran\Anaconda\lib\site-packages\numpy\lib\twodim_base.py", line 231, in eye
    m = zeros((N, M), dtype=dtype)
MemoryError

2 个答案:

答案 0 :(得分:0)

numpy为我提供了一个救生夹克 ...具有@ numba.jit()的奖励力量

遇到类似动机的麻烦,我花了一些时间来寻找如何逃避2GB天花板的方法(由O / S维持的最大值引入的另一个危险是Process可以分配为Private Bytes,之后整个Anaconda被Windows O / S和所有经过培训和调整的机器学习实例(读取几十个CPU核心小时)中止了。

finally:

numpy.memmap()的典型用法是对问题进行剪切/粘贴,对速度/数值(im)精度和fileIO操作进行观察和求解。在特征工程期间,@numba.jit()正在大型阵列上被大量使用,因此您可以从这种方法中受益,以进一步加快处理速度 - 非常感谢Travis OLIPHANT的团队。

with open( getCsvFileNAME( anFxCTX[aCtxID] ), "r" ) as aFH:
         # ------------------------------------------------------------- # .memmap
           DATA       = np.memmap(     getMmapFileNAME( anFxCTX[aCtxID] ),
                                       mode  = 'w+',                     # 'readwrite',                 # 'w+' <----------------------------IO Error: ( In WIN, not UX, if file is already open with another filehandle ... ) >>> https://github.com/spacetelescope/pyasdf/issues/100
                                       shape = ( getFileRowCOUNT( aFH ), 7 ),  # .shape
                                       #---------.float64 ----------------------------------------------------------------------------------------14-----------------------------------------------------------------------------------
                                       #type = np.float64                      # .dtype   np.float64  are 8B-IEEE-float precision overly enough w 14/15 significant digits in 56-bit mantissa
                                       #---------.float64 ----------------------------------------------------------------------------------------14-----------------------------------------------------------------------------------
                                       #============================================================================================================
                                       #---------.float32 -----------------------------------------------------------------------------------------7-----------------------------------------------------------------------------------
                                       dtype = np.float32                      # .dtype   np.float32  are 4B-IEEE-float precision fairly enough w  7/ 8 significant digits in 23-bit mantissa [[[ BUT may crash @.jit f8 SIGNATURES]]] 0.19
                                       #---------.float32 -----------------------------------------------------------------------------------------7-----------------------------------------------------------------------------------
                                       )   #   np.float64 SHALL BE kept here, for DATA, as this precision keeps convoluted calculus farther from numerical error propagation ( not the case for X_ and y_ that enter into SKLEARN )
         # ------------------------------------------------------------- # .genfromtxt assignment into .memmap is elementwise
           DATA[:,:]  = np.genfromtxt( aFH,
                                       skip_header     = 0,
                                       delimiter       = ",",
                                       #                                v     v       v       v       v       v 
                                       #                       2011.08.30,12:00,1791.20,1792.60,1787.60,1789.60,835
                                       #                       2011.08.30,13:00,1789.70,1794.30,1788.70,1792.60,550
                                       #                       2011.08.30,14:00,1792.70,1816.70,1790.20,1812.10,1222
                                       #                       2011.08.30,15:00,1812.20,1831.50,1811.90,1824.70,2373
                                       converters      = { 0:  lambda aString: mPlotDATEs.date2num( datetime.datetime.strptime( aString, "%Y.%m.%d" ) ),       #_______________________________________asFloat ( 1.0, +++ )
                                                           1:  lambda aString: ( ( int( aString[0:2] ) * 60 + int( aString[3:] ) ) / 60. / 24. )               #           ( 15*60 + 00 ) / 60. / 24.__asFloat < 0.0, 1.0 )
                                                           #                                    HH:                       :MM                                                HH      MM
                                                           }
                                       )[:,:] # -------------------------- # .memmap assigned elementwise

您可能会觉得convertors内的lambda- numpy.genfromtxt() 的强大功能可以帮助您进行.CSV解析 - 在设计阶段和设计阶段都很舒适。在代码执行阶段速度快。

答案 1 :(得分:0)

原因是我在训练时遇到内存错误,因为我使用的是32位Python。

现在我升级到Python 64位,一切都很好。

我甚至可以使我的网络足够大以挂起我的系统(这意味着现在64位Python上没有限制)。

只需要找到一个快乐的媒介(改变神经网络大小),这样我的系统就不会挂起。