在下面的代码示例中,我只能在不使用多重处理的情况下训练模型。
我的生成器直接来自tensorflow.keras.utils。序列描述https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence
有什么想法如何修复生成器以允许多处理?
在Win 10,tensorflow 1.13.1,python 3.6.8上运行
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.utils import Sequence
# Generator
class DataGenerator(Sequence):
def __init__(self, dim, batch_size, n_channels):
self.dim = dim
self.batch_size = batch_size
self.n_channels = n_channels
def __len__(self):
return 100
def __getitem__(self, idx):
X = np.random.randn(self.batch_size, self.dim, self.n_channels)
Y = np.random.randn(self.batch_size, self.dim, 1)
return X, Y
dim= 32
batch_size= 64
n_channels= 3
# Generators
training_generator = DataGenerator(dim, batch_size, n_channels)
validation_generator = DataGenerator(dim, batch_size, n_channels)
# Model
model = Sequential()
model.add(layers.GRU(128, return_sequences=True,
batch_input_shape=[None, training_generator.dim, training_generator.n_channels]))
model.add(layers.Dense(1))
model.compile(loss='mse', optimizer='adam')
# This training procedure runs
model.fit_generator(generator=training_generator,
epochs = 2,
steps_per_epoch = 100,
max_queue_size = 32,
validation_data=validation_generator,
validation_steps = 20,
verbose=1)
# This training procedure fails (Only change is that I added the multiprocessing options)
model.fit_generator(generator=training_generator,
epochs = 2,
steps_per_epoch = 100,
max_queue_size = 32,
validation_data=validation_generator,
validation_steps = 20,
verbose=1,
use_multiprocessing=True,
workers=4)
我希望第二个fit_generator()调用像第一个一样训练模型。相反,我没有输出,甚至没有错误消息。
答案 0 :(得分:0)
我在python 3.6.8和tensorflow 1.13.1的Ubuntu 18.04.2 LTS计算机上尝试了您的代码。在两种情况下都可以正常工作,如下所示:
2019-07-13 12:56:17.003119: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
100/100 [==============================] - 3s 27ms/step - loss: 0.9987
100/100 [==============================] - 10s 103ms/step - loss: 0.9973 - val_loss: 0.9987
Epoch 2/2
100/100 [==============================] - 3s 26ms/step - loss: 0.9955
100/100 [==============================] - 8s 83ms/step - loss: 1.0028 - val_loss: 0.9955
Multiprocessing=True ......
Epoch 1/2
100/100 [==============================] - 3s 32ms/step - loss: 0.9952
100/100 [==============================] - 9s 89ms/step - loss: 0.9962 - val_loss: 0.9952
Epoch 2/2
100/100 [==============================] - 3s 28ms/step - loss: 0.9967
100/100 [==============================] - 9s 86ms/step - loss: 0.9968 - val_loss: 0.9967"
我的建议是首先将模型和fit_generator代码都放在“ with tf.device('/ cpu:0'):”下,尝试仅使用CPU模式。如果可行,则可能是与GPU相关的问题,例如正确的驱动程序,具有GPU支持的张量流等。最有可能的原因是GPU挂起。