Question

我试图使用此处提供的卷积神经网络来解决“SVHN”数据集分类问题https://www.tensorflow.org/versions/0.6.0/tutorials/deep_cnn/index.html#convolutional-neural-networks

我读取数据并以这种方式格式化：

read_input = scipy.io.loadmat('data/train_32x32.mat')
converted_label = tf.cast(read_input['y'], tf.int32)
converted_image = tf.cast(read_input['X'], tf.float32)
reshaped_image = tf.transpose(converted_image, [3, 0, 1, 2])

在_generate_image_and_label_batch函数中，由于train_32X32.mat和text_32X32.mat中的输入图像已经是4D格式，我对代码进行了一些修改。

images, label_batch = tf.train.shuffle_batch(
      [image, label],
      batch_size=FLAGS.batch_size,
      enqueue_many=True,
      num_threads=num_preprocess_threads,
      capacity=min_queue_examples + 3 * FLAGS.batch_size,
      min_after_dequeue=min_queue_examples)

我最终遇到了这些错误：

Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 4
W tensorflow/core/kernels/cast_op.cc:66] Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x7f1c180015a0 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
     [[Node: Cast_1 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/cpu:0"](Cast_1/x)]]
W tensorflow/core/kernels/cast_op.cc:66] Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x7f1c280ea810 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
     [[Node: Cast_1 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/cpu:0"](Cast_1/x)]]
Killed

如果我在任何逻辑中犯了任何错误，请告诉我。

由于

萨拉

Answer 1

请注意，您的数据包含2 * 32 * 3 * 73257条目，即浮点数900 MB或双倍1800 MB。因此，您在read_input['X']分配了1800MB，然后TF将其转换为张量以提供给cast，这是另一个900MB。 tf.cast的输出是另一个900MB的张量，transpose的输出是另一个900MB的张量。

因此，您可能需要4.5GB的RAM才能正常工作。

通常，这种方法（转换为Constant节点）仅建议用于“小”问题。您可以将2GB的硬限制放入常量，但是如果您移动到GPU（例如here），则更小的值（即> 100MB）可能会导致问题

另一种可扩展的方法是使用像Cifar示例

中的输入管道

使用Tensorflow错误解决SVHN：“资源耗尽：分配张量时的OOM ...”

1 个答案: