Question

类似于 Tensorflow batch_join's allow_smaller_final_batch doesn't work? ，我想将一批图像传递给TensorFlow。

我的磁盘上有365个图像，并且批处理大小为100。这意味着最后一次运行必须获取65个图像。但是我无法实现。

这是我成功完成的工作，再现了Eypros'answer：

for _ in range(nthreads):
    image_list = load_images(input_queue.dequeue())

image_batch = tf.train.batch_join(image_list, batch_size=100, 
                enqueue_many=True, allow_smaller_final_batch=True, 
                capacity=10)
for n in range(3):
    print (n, len(sess.run([image_batch])))

coord.request_stop()
coord.join(threads)

print (n+1, len(sess.run([image_batch  ])))

我得到了预期的

0 100
  1100
  2 100
  3 10

但是，如果我将 capacity 设置为65，则在最后一批中没有得到所需的65个文件，我只能得到20个文件。我必须补充的是，当 nthreads = 4时，这会发生。当我减少trhead的数量时，结果会更糟。

我试图做的是查询输入队列，然后在coord.request_stop()之前睡一会儿。

numq = sess.run(input_queue.size())
print ('after ', n, ' batches, input_queue size:', numq)
if numq > 0:
    time.sleep(0.08)
    numq = sess.run(input_queue.size())
    print ('after sleep, input_queue size:', numq)

这会有所帮助，但是如果睡眠时间太长（即输入队列变为0），则我的最后一个sess.run()会无限卡住。我不知道为什么。

我讨厌sleep()骇客。我正在寻找一种干净，高效且可靠的方式来消耗多线程会话中的所有图像。

我注意到了tf.train.batch_join is deprecated，但是我不知道如何将我的简单逻辑转换为建议的tf.data.Dataset.interleave(...).batch(batch_size)。也就是说，我不知道为interleave写什么。也许，如果我使用数据集，我的问题会很容易解决吗？

Tensorflow batch_join读取最后一批中的所有图像

0 个答案: