使用Tensorflow错误解决SVHN:“资源耗尽:分配张量时的OOM ...”

时间:2016-03-01 22:45:19

标签: linux tensorflow deep-learning

我试图使用此处提供的卷积神经网络来解决“SVHN”数据集分类问题https://www.tensorflow.org/versions/0.6.0/tutorials/deep_cnn/index.html#convolutional-neural-networks

我读取数据并以这种方式格式化:

read_input = scipy.io.loadmat('data/train_32x32.mat')
converted_label = tf.cast(read_input['y'], tf.int32)
converted_image = tf.cast(read_input['X'], tf.float32)
reshaped_image = tf.transpose(converted_image, [3, 0, 1, 2])

_generate_image_and_label_batch函数中,由于train_32X32.mattext_32X32.mat中的输入图像已经是4D格式,我对代码进行了一些修改。

images, label_batch = tf.train.shuffle_batch(
      [image, label],
      batch_size=FLAGS.batch_size,
      enqueue_many=True,
      num_threads=num_preprocess_threads,
      capacity=min_queue_examples + 3 * FLAGS.batch_size,
      min_after_dequeue=min_queue_examples)

我最终遇到了这些错误:

Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 4
W tensorflow/core/kernels/cast_op.cc:66] Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x7f1c180015a0 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
     [[Node: Cast_1 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/cpu:0"](Cast_1/x)]]
W tensorflow/core/kernels/cast_op.cc:66] Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x7f1c280ea810 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
     [[Node: Cast_1 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/cpu:0"](Cast_1/x)]]
Killed

如果我在任何逻辑中犯了任何错误,请告诉我。

由于

萨拉

1 个答案:

答案 0 :(得分:1)

请注意,您的数据包含2 * 32 * 3 * 73257条目,即浮点数900 MB或双倍1800 MB。因此,您在read_input['X']分配了1800MB,然后TF将其转换为张量以提供给cast,这是另一个900MB。 tf.cast的输出是另一个900MB的张量,transpose的输出是另一个900MB的张量。

因此,您可能需要4.5GB的RAM才能正常工作。

通常,这种方法(转换为Constant节点)仅建议用于“小”问题。您可以将2GB的硬限制放入常量,但是如果您移动到GPU(例如here),则更小的值(即> 100MB)可能会导致问题

另一种可扩展的方法是使用像Cifar示例

中的输入管道