我一直在通过以下训练循环训练类似于Pix2Pix的条件GAN架构:
for epoch in range(start_epoch, end_epoch):
for batch_i, (input_batch, target_batch) in enumerate(dataLoader.load_batch(batch_size)):
fake_batch= self.generator.predict(input_batch)
d_loss_real = self.discriminator.train_on_batch(target_batch, valid)
d_loss_fake = self.discriminator.train_on_batch(fake_batch, invalid)
d_loss = np.add(d_loss_fake, d_loss_real) * 0.5
g_loss = self.combined.train_on_batch([target_batch, input_batch], [valid, target_batch])
现在这很好用,但是效率不高,因为数据加载器很快就成为瓶颈。 我研究了keras提供的.fit_generator()函数,该函数允许生成器在辅助线程中运行,并且运行速度更快。
self.combined.fit_generator(generator=trainLoader,
validation_data=evalLoader
callbacks=[checkpointCallback, historyCallback],
workers=1,
use_multiprocessing=True)
我花了一些时间才知道这是不正确的,我不再分别训练生成器和鉴别器,并且因为在组合模型中将鉴别器设置为trainable = False
,所以根本没有对鉴别器进行任何训练。 ,基本上会破坏任何对抗性损失,我不妨使用MSE
自己训练生成器。
现在我的问题是是否有一些解决方法,例如在自定义回调中训练我的鉴别器,这将在每批.fit_generator()方法中触发?可以实现创建自定义回调的方法,例如:
class MyCustomCallback(tf.keras.callbacks.Callback):
def on_train_batch_end(self, batch, logs=None):
discriminator.train_on_batch()
另一种可能性是并行化原始的训练循环,但是恐怕我现在没有时间这样做。
答案 0 :(得分:2)
更新:为此内置了队列器:
tf.keras.utils.SequenceEnqueuer
:SequenceEnqueuer tf.keras.utils.OrdereEnqueuer
:OrderedEnqueuer 您可以在以下答案中查看使用它们的快速方法:https://stackoverflow.com/a/59214794/2097240
旧答案:
我正是为此目的创建了这个并行化的迭代器。我在训练中使用它;
这是您的用法:
for epoch, batchIndex, originalBatchIndex, xAndY in ParallelIterator(
generator,
epochs,
shuffle_bool,
use_on_epoch_end_from_generator_bool,
workers = 8,
queue_size=10):
#loop content
x_train_batch, y_train_batch = xAndY
model.train_on_batch(x_train_batch, y_train_batch)
generator
应该有您的dataloader
,但它必须是keras.utils.Sequence
,而不仅仅是收益产生器。
但是,如果需要的话,适应并不是很复杂。 (不过,我不知道它是否可以正确并行化,我也不知道yield循环是否可以正确并行化)
在下面的迭代器定义中,您应该替换:
len(keras_sequence)
和steps_per_epoch
keras_sequence[i]
和next(keras_sequence)
use_on_epoch_end = False
这是迭代器的定义:
import multiprocessing.dummy as mp
#A generator that wraps a Keras Sequence and simulates a `fit_generator` behavior for custom training loops
#It will also work with any iterator that has `__len__` and `__getitem__`.
def ParallelIterator(keras_sequence, epochs, shuffle, use_on_epoch_end, workers = 4, queue_size = 10):
sourceQueue = mp.Queue() #queue for getting batch indices
batchQueue = mp.Queue(maxsize = queue_size) #queue for getting actual batches
indices = np.arange(len(keras_sequence)) #array of indices to be shuffled
use_on_epoch_end = 'on_epoch_end' in dir(keras_sequence) if use_on_epoch_end == True else False
batchesLeft = 0
# printQueue = mp.Queue() #queue for printing messages
# import threading
# screenLock = threading.Semaphore(value=1)
# totalWorkers= 0
# def printer():
# nonlocal printQueue, printing
# while printing:
# while not printQueue.empty():
# text = printQueue.get(block=True)
# screenLock.acquire()
# print(text)
# screenLock.release()
#fills the batch indices queue (called when sourceQueue is empty -> a few batches before an epoch ends)
def fillSource():
nonlocal batchesLeft
# printQueue.put("Iterator: fill source - source qsize = " + str(sourceQueue.qsize()))
if shuffle == True:
np.random.shuffle(indices)
#puts the indices in the indices queue
batchesLeft += len(indices)
# printQueue.put("Iterator: batches left:" + str(batchesLeft))
for i in indices:
sourceQueue.put(i)
#function that will load batches from the Keras Sequence
def worker():
nonlocal sourceQueue, batchQueue, keras_sequence, batchesLeft
# nonlocal printQueue, totalWorkers
# totalWorkers += 1
# thisWorker = totalWorkers
while True:
# printQueue.put('Worker: ' + str(thisWorker) + ' will try to get item')
index = sourceQueue.get(block = True) #get index from the queue
# printQueue.put('Worker: ' + str(thisWorker) + ' got item ' + str(index) + " - source q size = " + str(sourceQueue.qsize()))
if index is None:
break
item = keras_sequence[index] #get batch from the sequence
batchesLeft -= 1
# printQueue.put('Worker: ' + str(thisWorker) + ' batches left ' + str(batchesLeft))
batchQueue.put((index,item), block=True) #puts batch in the batch queue
# printQueue.put('Worker: ' + str(thisWorker) + ' added item ' + str(index) + ' - queue: ' + str(batchQueue.qsize()))
# printQueue.put("hitting end of worker" + str(thisWorker))
# #printing pool that will print messages from the print queue
# printing = True
# printPool = mp.Pool(1, printer)
#creates the thread pool that will work automatically as we get from the batch queue
pool = mp.Pool(workers, worker)
fillSource() #at this point, data starts being taken and stored in the batchQueue
#generation loop
for epoch in range(epochs):
#if not waiting for epoch end synchronization, always keeps 1 epoch filled ahead
if (use_on_epoch_end == False):
if epoch + 1 < epochs: #only fill if not last epoch
fillSource()
for batch in range(len(keras_sequence)):
#if waiting for epoch end synchronization, wait for workers to have no batches left to get, then call epoch end and fill
if use_on_epoch_end == True:
if batchesLeft == 0:
keras_sequence.on_epoch_end()
if epoch + 1 < epochs: #only fill if not last epoch
fillSource()
else:
batchesLeft = -1 #in the last epoch, prevents from calling epoch end again and again
#yields batches for the outside loop that is using this generator
originalIndex, batchItems = batchQueue.get(block = True)
yield epoch, batch, originalIndex, batchItems
# print("iterator epoch end")
# printQueue.put("closing threads")
#terminating the pool - add None to the queue so any blocked worker gets released
for i in range(workers):
sourceQueue.put(None)
pool.terminate()
pool.close()
pool.join()
# printQueue.put("terminated")
# printing = False
# printPool.terminate()
# printPool.close()
# printPool.join()
del pool,sourceQueue,batchQueue
# del printPool, printQueue
答案 1 :(得分:1)
尽管您的问题已经有了解决方案,但我想回答您的原始问题,是否可以在组合模型中的自定义回调中训练鉴别器。
简单的答案是是。
在编译模型(Discriminator和组合模型)时要小心,并遵循此处说明的步骤: https://github.com/keras-team/keras/issues/8585#issuecomment-385729276
调用组合的模型拟合或拟合生成器:
combined_model.fit_generator(train_loader, epochs, callbacks=[gan_callback])
gan_callback是一个自定义的回调类,它将覆盖您调用的on_batch_end(如您所述)
def on_batch_end(self, batch_idx, logs=None):
logs_disc = model_disc.train_on_batch(x, y)
要在您的回调中获取鉴别器模型,请在构造时将其作为参数提供,或通过继承的self.model(model.layers)变量获取。
当您要将损失和指标输出到张量板时,我认为此解决方案很不错。
在gan_callback中的on_batch_end函数中,您直接有两个日志(包含损失和指标的值):
根据您的配置,这可能会产生警告,可以忽略该警告:
UserWarning: Method on_batch_end() is slow compared to the batch update (0.151899). Check your callbacks.