model.fit_generator()失败,原因是use_multiprocessing = True

时间:2019-04-11 20:21:45

标签: tensorflow keras multiprocessing generator

在下面的代码示例中,我只能在不使用多重处理的情况下训练模型。

我的生成器直接来自tensorflow.keras.utils。序列描述https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence

有什么想法如何修复生成器以允许多处理?

在Win 10,tensorflow 1.13.1,python 3.6.8上运行

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.utils import Sequence


# Generator
class DataGenerator(Sequence):

        def __init__(self, dim, batch_size, n_channels):

            self.dim = dim            
            self.batch_size = batch_size
            self.n_channels = n_channels

        def __len__(self):
            return 100

        def __getitem__(self, idx):

            X = np.random.randn(self.batch_size, self.dim, self.n_channels)
            Y = np.random.randn(self.batch_size, self.dim, 1)

            return X, Y


dim= 32
batch_size= 64
n_channels= 3

# Generators
training_generator = DataGenerator(dim, batch_size, n_channels)
validation_generator = DataGenerator(dim, batch_size, n_channels)


# Model
model = Sequential()
model.add(layers.GRU(128, return_sequences=True, 
                     batch_input_shape=[None, training_generator.dim, training_generator.n_channels]))
model.add(layers.Dense(1))

model.compile(loss='mse', optimizer='adam')


# This training procedure runs
model.fit_generator(generator=training_generator,
                    epochs = 2,
                    steps_per_epoch = 100,
                    max_queue_size = 32,
                    validation_data=validation_generator,
                    validation_steps = 20,
                    verbose=1)

# This training procedure fails (Only change is that I added the multiprocessing options)
model.fit_generator(generator=training_generator,
                    epochs = 2,
                    steps_per_epoch = 100,
                    max_queue_size = 32,
                    validation_data=validation_generator,
                    validation_steps = 20,
                    verbose=1,
                    use_multiprocessing=True,
                    workers=4)

我希望第二个fit_generator()调用像第一个一样训练模型。相反,我没有输出,甚至没有错误消息。

1 个答案:

答案 0 :(得分:0)

我在python 3.6.8和tensorflow 1.13.1的Ubuntu 18.04.2 LTS计算机上尝试了您的代码。在两种情况下都可以正常工作,如下所示:

2019-07-13 12:56:17.003119: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
100/100 [==============================] - 3s 27ms/step - loss: 0.9987
100/100 [==============================] - 10s 103ms/step - loss: 0.9973 - val_loss: 0.9987
Epoch 2/2
100/100 [==============================] - 3s 26ms/step - loss: 0.9955
100/100 [==============================] - 8s 83ms/step - loss: 1.0028 - val_loss: 0.9955
Multiprocessing=True ......
Epoch 1/2
100/100 [==============================] - 3s 32ms/step - loss: 0.9952
100/100 [==============================] - 9s 89ms/step - loss: 0.9962 - val_loss: 0.9952
Epoch 2/2
100/100 [==============================] - 3s 28ms/step - loss: 0.9967
100/100 [==============================] - 9s 86ms/step - loss: 0.9968 - val_loss: 0.9967"

我的建议是首先将模型和fit_generator代码都放在“ with tf.device('/ cpu:0'):”下,尝试仅使用CPU模式。如果可行,则可能是与GPU相关的问题,例如正确的驱动程序,具有GPU支持的张量流等。最有可能的原因是GPU挂起。