使用Python 3.6.x进行model.fit_generator的keras线程安全生成器

时间:2017-10-01 03:41:59

标签: multithreading python-3.x keras generator

我使用Keras 2.0.8进行U-net 2D医学分割项目。 目前,我正在努力创建自定义线程安全图像生成器(同时用于X和y)。 X和y是4D矩阵,形状为n_img x n_col x n_row x T,其中X为T为4,y为1(4个数字标签在第4维转换为一个热编码)

这是我的代码:

def gen_tr(X,y,batch_size):
    n=np.floor((len(X)-1)/batch_size).astype(int)
    s=list(X.shape)
    s[0]=batch_size
    while 1:
        for i in range(n):
            j=0
            X_b=np.zeros(s,dtype=np.float32)
            y_b=np.zeros(s,dtype=int)
            while j<batch_size:
                data=distort_imgs(X[i*batch_size+j,:,:,0, np.newaxis],
                              X[i*batch_size+j,:,:,1, np.newaxis], 
                              X[i*batch_size+j,:,:,2, np.newaxis],
                              X[i*batch_size+j,:,:,3, np.newaxis], 
                              y[i*batch_size+j,:,:,0, np.newaxis])
                X_i=np.concatenate(data[:4],axis=2)
                y_i=data[-1]
                y_i=np.concatenate((y_i==0,y_i==1,y_i==2,y_i==4),
                               axis=2).astype(int)
                X_b[j]=X_i
                y_b[j]=y_i
                j+=1
            yield (X_b,y_b)
batch_size=20
gen = gen_tr(X_train,Y_train,batch_size)
steps=np.floor((len(X_train)-1)/batch_size).astype(int)
model.fit_generator(gen,steps_per_epoch=steps, epochs=5, verbose=1, shuffle=True, 
max_queue_size=10,workers=2, use_multiprocessing=False)

错误:

Exception in thread Thread-13:
Traceback (most recent call last):
  File "D:\Users\SZ_KOCOT\Anaconda3\envs\cnn1\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "D:\Users\SZ_KOCOT\Anaconda3\envs\cnn1\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "D:\Users\SZ_KOCOT\Anaconda3\envs\cnn1\lib\site-packages\keras\utils\data_utils.py", line 568, in data_generator_task
    generator_output = next(self._generator)
ValueError: generator already executing

Traceback (most recent call last):

  File "<ipython-input-17-1a91cea3a91e>", line 7, in <module>
    max_queue_size=10,workers=2, use_multiprocessing=False)

  File "D:\Users\SZ_KOCOT\Anaconda3\envs\cnn1\lib\site-packages\keras\legacy\interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)

  File "D:\Users\SZ_KOCOT\Anaconda3\envs\cnn1\lib\site-packages\keras\engine\training.py", line 2011, in fit_generator
    generator_output = next(output_generator)

StopIteration

我尝试过以下解决方案: keunwoochoi.wordpress.comstanford(相同)。

他们都没有奏效。当我添加:

import threading
class threadsafe_iter:
def __init__(self, it):
    self.it = it
    self.lock = threading.Lock()
def __iter__(self):
    return self
def __next__(self):
    with self.lock:
        return self.it.next()

def threadsafe_generator(f):
    def g(*a, **kw):
        return threadsafe_iter(f(*a, **kw))
    return g

@threadsafe_generator
#now goes my generator from above

我收到了错误:

Epoch 1/5
Exception in thread Thread-10:
Traceback (most recent call last):
  File "D:\Users\SZ_KOCOT\Anaconda3\envs\cnn1\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "D:\Users\SZ_KOCOT\Anaconda3\envs\cnn1\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "D:\Users\SZ_KOCOT\Anaconda3\envs\cnn1\lib\site-packages\keras\utils\data_utils.py", line 568, in data_generator_task
    generator_output = next(self._generator)
  File "<ipython-input-12-24605a93d655>", line 17, in __next__
    return self.it.next()
AttributeError: 'generator' object has no attribute 'next'

Traceback (most recent call last):

  File "<ipython-input-13-b07830ef87c0>", line 5, in <module>
    max_queue_size=10,workers=2, use_multiprocessing=False)

  File "D:\Users\SZ_KOCOT\Anaconda3\envs\cnn1\lib\site-packages\keras\legacy\interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)

  File "D:\Users\SZ_KOCOT\Anaconda3\envs\cnn1\lib\site-packages\keras\engine\training.py", line 2011, in fit_generator
    generator_output = next(output_generator)

StopIteration

在fit_generator中使worker = 1并且一切正常(提供没有代码的生成器来自解决方案),包括next(gen)和gen .__ next __

1线程数据生成器性能不足,特别是我有多个核心可用...

有人可以帮我解决这个问题吗?我是python线程中的新手。

编辑: 我找到了解决方案/解决方法。对keras来说可能有点过于讨厌,但它确实有效。灵感来自zsdonghao。通过在2750个样本的10个部分中分割数据集增强,我能够非常快速地准备数据并且几乎100%地使用gtx1080。 ram的使用量也不超过~22GB。 1个时期的训练大约需要14-15分钟,数据准备/ 8月需要总共10-12分钟。 当我将它与fit_generator与单个工作者进行比较时,减少的次数超过3次

如果它可以帮到某人,这里是确切的代码:

import tensorlayer as tl
import pandas as pd

batch_size=20
epochs=10
batch_size=20
step_size=2750
steps=np.floor((len(X_train)-1)/step_size).astype(int)
s=list(X_train.shape)
train_all=pd.DataFrame()
eval_all=pd.DataFrame()

#training and evaluation
for i in range(epochs):
    start_time = time.clock()
    print('Epoch: {0:02d}'.format(i+1))
    for j in range(steps):
        ind=range(step_size*j,step_size*(j+1))
        data = tl.prepro.threading_data([_ for _ in zip(X_train[ind,:,:,0, np.newaxis],
                                                        X_train[ind,:,:,1, np.newaxis], 
                                                        X_train[ind,:,:,2, np.newaxis],
                                                        X_train[ind,:,:,3, np.newaxis],
                                                        y_train[ind])],fn=distort_imgs,thread_count=None)
        X_s = data[:,0:4,:,:,:]                                                 
        y_s = data[:,4,:,:,:]
        X_s = X_s.transpose((0,2,3,1,4))
        X_s.shape = (step_size, s[1], s[2], s[3])
        y_s=np.concatenate((y_s==0,y_s==1,y_s==2,y_s==4),
                                           axis=3).astype(int)
        train=model.fit(X_s, y_s,class_weight=weights, verbose=0,
                        batch_size=batch_size, epochs=i+2,initial_epoch=i+1)
        train.history['epoch']=i+1
        train.history['step']=j+1
        train=pd.DataFrame(train.history)
        train_all=pd.concat([train_all,train],ignore_index=True)
        print(train.to_string(index=False))
    eval=model.evaluate(X_test, y_test, batch_size=batch_size, verbose=0)
    eval=pd.DataFrame({'val_dice_coe':eval[0],'val_dice_hard_coe':eval[1], 'val_iou_coe':eval[2], 'val_loss':eval[3]},index=[0])
    eval['epoch']=i+1
    eval_all=pd.concat([eval_all,eval],ignore_index=True)
    print(eval.to_string(index=False))
    model.save('{0}_ep_{1}.h5'.format(model_name,i+1))
    print('Epoch {0:02d} took: {1:.3f} min'.format(i+1,(time.clock()-start_time)/60))

1 个答案:

答案 0 :(得分:1)

在Python 3中,您应该使用next(self.it)而不是self.it.next()

您也可以尝试使用Keras Sequences,它似乎更安全,因为它被编入索引以在多处理时保留正确的数据顺序。

最后,workers似乎只影响生成器本身,而不影响模型。在我的测试中(我不擅长线程化......)我可以看到更多工作者的唯一区别是更大的预加载数据队列等待他们进入模型。