我的硬盘上存有大量的腌制数据。我有一个生成器功能,可以批量读取这些pickle文件( batch_size = 512 ),并且我使用tensorflow的队列加快这个过程。目前,我的 queue_size是4096 ,我使用 6个线程作为我的6个物理核心。当我运行代码并监控我的GPU负载时(我使用TitanX),在开始时看起来没问题:
但是随着时间的推移,我看到GPU的负载减少了:
我还看到每个时期的执行时间增加:
大纪元1 | Exec的。时间:1646.523872
大纪元2 | Exec的。时间:1760.770192
大纪元3 | Exec的。时间:1861.450039
大纪元4 | Exec的。时间:1952.52812
大纪元5 | Exec的。时间:2167.598431
大纪元6 | Exec的。时间:2278.203603
大纪元7 | Exec的。时间:2320.280606
大纪元8 | Exec的。时间:2467.036160
大纪元9 | Exec的。时间:2584.932837
大纪元10 | Exec的。时间:2736.121618
...
大纪元20 | Exec的。时间:3841.635191
我观察到的GPU加载解释了它。
现在,问题是为什么会发生这种情况?这是tensorflow队列中的错误吗?我做错了什么?!我使用tensorflow ver。 1.4如果它有助于我定义队列,排队和出队:
def get_train_queue(batch_size, data_generator, queue_size, num_threads):
# get train queue to parallelize loading data
q = tf.FIFOQueue(capacity = queue_size, dtypes = [tf.float32, tf.float32, tf.float32, tf.float32],
shapes = [[batch_size, x_height, x_width, num_channels],
[batch_size, num_classes],
[batch_size, latent_size],
[batch_size]])
batch = next(data_generator)
batch_z = np.random.uniform(-1.0, 1.0, size = (batch_size, latent_size))
mask = get_labled_mask(labeled_rate, batch_size)
enqueue_op = q.enqueue((batch[0], batch[1], batch_z, mask))
qr = tf.train.QueueRunner(q, [enqueue_op] * num_threads)
tf.train.add_queue_runner(qr)
return q
和
def train_per_batch(sess, q, train_samples_count, batch_size, parameters, epoch):
# train_per_batch and get train loss and accuracy
t_total = 0
for iteration in range(int(train_samples_count / batch_size)):
t_start = time.time()
data = q.dequeue()
feed_dictionary = {parameters['x']: sess.run(data[0]),
parameters['z']: sess.run(data[2]),
parameters['label']: sess.run(data[1]),
parameters['labeled_mask']: sess.run(data[3]),
parameters['dropout_rate']: dropout,
parameters['d_init_learning_rate']: D_init_learning_rate,
parameters['g_init_learning_rate']: G_init_learning_rate,
parameters['is_training']: True}
sess.run(parameters['D_optimizer'], feed_dict = feed_dictionary)
sess.run(parameters['G_optimizer'], feed_dict = feed_dictionary)
train_D_loss = sess.run(parameters['D_L'], feed_dict = feed_dictionary)
train_G_loss = sess.run(parameters['G_L'], feed_dict = feed_dictionary)
t_total += (time.time() - t_start)
我也尝试过tensorflow 1.4推荐的tf.data.Dataset.from_generator()
:
train_dataset = tf.data.Dataset.from_generator(data_generator_training_from_pickles,
(tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32),
([batch_size, x_height, x_width, num_channels],
[batch_size, num_classes],
[batch_size, latent_size],
[batch_size])))
然后使用:
def train_per_batch(sess, train_dataset, train_samples_count, batch_size, parameters, epoch):
# train_per_batch and get train loss and accuracy
t_total = 0
for iteration in range(int(train_samples_count / batch_size)):
t_start = time.time()
data = train_dataset.make_one_shot_iterator().get_next()
feed_dictionary = {parameters['x']: sess.run(data[0]),
parameters['z']: sess.run(data[2]),
parameters['label']: sess.run(data[1]),
parameters['labeled_mask']: sess.run(data[3]),
parameters['dropout_rate']: dropout,
parameters['d_init_learning_rate']: D_init_learning_rate,
parameters['g_init_learning_rate']: G_init_learning_rate,
parameters['is_training']: True}
sess.run(parameters['D_optimizer'], feed_dict = feed_dictionary)
sess.run(parameters['G_optimizer'], feed_dict = feed_dictionary)
train_D_loss = sess.run(parameters['D_L'], feed_dict = feed_dictionary)
train_G_loss = sess.run(parameters['G_L'], feed_dict = feed_dictionary)
t_total += (time.time() - t_start)
这是最糟糕的。显然没有排队: