我正在AWS实例g2.2xlarge上运行卷积神经网络。该型号可以运行30000张尺寸为64x64的图像。但是,当我尝试使用大小为128x128的图像运行它时,即使我只输入1个图像(有2个通道 - 实数和虚数),它也会出现内存错误(见下文)。
因为错误提到了形状的张量[32768,16384],我认为它发生在第一个(完全连接)层,它采用两个通道128 * 128 * 2 = 32768的输入图像,并输出128 * 128 = 16384矢量。
我找到了减少批量大小的建议,但是,我只使用了1个输入图像
Here据说使用cudnn可以在我使用的同一个AWS实例上达到700-900px(尽管我不知道它们是否使用完全连接的层)。我尝试了两个不同的AMI(1和2),两者都安装了cudnn,但仍然出现内存错误。
我的问题是:
1.如何计算[32768,16384]张量需要多少内存?我不是计算机科学家,所以我希望得到详细的答复
2.我想我试图理解我使用的实例是否真的对我的数据有太少的内存(g2.2xlarge有15 GiB)或者我只是做错了什么。
错误:
2018-01-24 16:36:53.666427: I
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: SSE4.1
SSE4.2 AVX
2018-01-24 16:36:55.069050: I
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node
read from SysFS had negative value (-1), but there must be at least one NUMA
node, so returning NUMA node zero
2018-01-24 16:36:55.069287: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1062] Found device 0 with
properties:
name: GRID K520 major: 3 minor: 0 memoryClockRate(GHz): 0.797
pciBusID: 0000:00:03.0
totalMemory: 3.94GiB freeMemory: 3.90GiB
2018-01-24 16:36:55.069316: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1152] Creating TensorFlow
device (/device:GPU:0) -> (device: 0, name: GRID K520, pci bus id:
0000:00:03.0, compute capability: 3.0)
2018-01-24 16:37:59.766001: W
tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran
out of memory trying to allocate 2.00GiB. Current allocation summary follows.
2018-01-24 16:37:59.766054: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (256): Total
Chunks: 10, Chunks in use: 10. 2.5KiB allocated for chunks. 2.5KiB in use in
bin. 40B client-requested in use in bin.
2018-01-24 16:37:59.766070: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (512): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766084: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (1024): Total
Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in
bin. 1.0KiB client-requested in use in bin.
2018-01-24 16:37:59.766094: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (2048): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766108: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (4096): Total
Chunks: 2, Chunks in use: 2. 12.5KiB allocated for chunks. 12.5KiB in use in
bin. 12.5KiB client-requested in use in bin.
2018-01-24 16:37:59.766122: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (8192): Total
Chunks: 2, Chunks in use: 2. 24.5KiB allocated for chunks. 24.5KiB in use in
bin. 24.5KiB client-requested in use in bin.
2018-01-24 16:37:59.766134: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (16384): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766143: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (32768): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766155: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (65536): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766163: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (131072): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766177: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (262144): Total
Chunks: 2, Chunks in use: 2. 800.0KiB allocated for chunks. 800.0KiB in use in
bin. 800.0KiB client-requested in use in bin.
2018-01-24 16:37:59.766196: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (524288): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766208: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (1048576): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766221: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (2097152): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766230: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (4194304): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766241: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (8388608): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766250: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (16777216): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766262: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (33554432): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766271: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (67108864): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766282: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (134217728): Total
Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B
client-requested in use in bin.
2018-01-24 16:37:59.766292: I
tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (268435456): Total
Chunks: 2, Chunks in use: 1. 3.57GiB allocated for chunks. 2.00GiB in use in
bin. 2.00GiB client-requested in use in bin.
2018-01-24 16:37:59.766304: I
tensorflow/core/common_runtime/bfc_allocator.cc:644] Bin for 2.00GiB was
256.00MiB, Chunk State:
2018-01-24 16:37:59.766335: I
tensorflow/core/common_runtime/bfc_allocator.cc:650] Size: 1.57GiB |
Requested Size: 0B | in_use: 0, prev: Size: 2.00GiB | Requested Size:
2.00GiB | in_use: 1
2018-01-24 16:37:59.766358: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680000 of
size 1280
2018-01-24 16:37:59.766374: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680500 of
size 256
2018-01-24 16:37:59.766381: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680600 of
size 256
2018-01-24 16:37:59.766387: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680700 of
size 256
2018-01-24 16:37:59.766397: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680800 of
size 256
2018-01-24 16:37:59.766402: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680900 of
size 256
2018-01-24 16:37:59.766412: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680a00 of
size 256
2018-01-24 16:37:59.766422: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680b00 of
size 256
2018-01-24 16:37:59.766429: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680c00 of
size 256
2018-01-24 16:37:59.766435: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680d00 of
size 256
2018-01-24 16:37:59.766459: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680e00 of
size 256
2018-01-24 16:37:59.766471: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702680f00 of
size 6400
2018-01-24 16:37:59.766477: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702682800 of
size 6400
2018-01-24 16:37:59.766482: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702684100 of
size 409600
2018-01-24 16:37:59.766492: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x7026e8100 of
size 409600
2018-01-24 16:37:59.766499: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x70274c100 of
size 12544
2018-01-24 16:37:59.766509: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x70274f200 of
size 12544
2018-01-24 16:37:59.766517: I
tensorflow/core/common_runtime/bfc_allocator.cc:662] Chunk at 0x702752300 of
size 2147483648
2018-01-24 16:37:59.766523: I
tensorflow/core/common_runtime/bfc_allocator.cc:671] Free at 0x782752300 of
size 1684724992
2018-01-24 16:37:59.766530: I
tensorflow/core/common_runtime/bfc_allocator.cc:677] Summary of in-use
Chunks by size:
2018-01-24 16:37:59.766543: I
tensorflow/core/common_runtime/bfc_allocator.cc:680] 10 Chunks of size 256
totalling 2.5KiB
2018-01-24 16:37:59.766557: I
tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 1280
totalling 1.2KiB
2018-01-24 16:37:59.766569: I
tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 6400
totalling 12.5KiB
2018-01-24 16:37:59.766577: I
tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 12544
totalling 24.5KiB
2018-01-24 16:37:59.766585: I
tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 409600
totalling 800.0KiB
2018-01-24 16:37:59.766596: I
tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size
2147483648 totalling 2.00GiB
2018-01-24 16:37:59.766606: I
tensorflow/core/common_runtime/bfc_allocator.cc:684] Sum Total of in-use
chunks: 2.00GiB
2018-01-24 16:37:59.766620: I
tensorflow/core/common_runtime/bfc_allocator.cc:686] Stats:
Limit: 3833069568
InUse: 2148344576
MaxInUse: 2148344576
NumAllocs: 18
MaxAllocSize: 2147483648
2018-01-24 16:37:59.766635: W
tensorflow/core/common_runtime/bfc_allocator.cc:277]
2018-01-24 16:37:59.766660: W tensorflow/core/framework/op_kernel.cc:1188]
Resource exhausted: OOM when allocating tensor of shape [32768,16384] and type
float
2018-01-24 16:38:00.828932: E tensorflow/core/common_runtime/executor.cc:651]
Executor failed to create kernel. Resource exhausted: OOM when allocating
tensor of shape [32768,16384] and type float
[[Node: fc1/weights/RMSProp_1/Initializer/zeros = Const[_class=
["loc:@fc1/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape:
[32768,16384] values: [0 0 0]...>,
_device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Traceback (most recent call last):
File "myAutomap.py", line 278, in <module>
print_cost=True)
File "myAutomap.py", line 240, in model
sess.run(init)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/client/session.py",
line 889, in run
run_metadata_ptr)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/client/session.py",
line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/client/session.py",
line 1317, in _do_run
options, run_metadata)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/client/session.py",
line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when
allocating tensor of shape [32768,16384] and type float
[[Node: fc1/weights/RMSProp_1/Initializer/zeros = Const[_class=
["loc:@fc1/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape:
[32768,16384] values: [0 0 0]...>,
_device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Caused by op u'fc1/weights/RMSProp_1/Initializer/zeros', defined at:
File "myAutomap.py", line 278, in <module>
print_cost=True)
File "myAutomap.py", line 228, in model
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/training/optimizer.py", line 365, in minimize
name=name)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/training/optimizer.py", line 516, in
apply_gradients
self._create_slots([_get_variable_for(v) for v in var_list])
File "/usr/lib/python2.7/dist-packages/tensorflow/python/training/rmsprop.py",
line 113, in _create_slots
self._zeros_slot(v, "momentum", self._name)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/training/optimizer.py", line 882, in _zeros_slot
named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/training/slot_creator.py", line 174, in
create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/training/slot_creator.py", line 148, in
create_slot_with_initializer
dtype)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/training/slot_creator.py", line 67, in
_create_slot_var
validate_shape=validate_shape)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/ops/variable_scope.py", line 1256, in get_variable
constraint=constraint)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/ops/variable_scope.py", line 1097, in get_variable
constraint=constraint)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/ops/variable_scope.py", line 435, in get_variable
constraint=constraint)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/ops/variable_scope.py", line 404, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/ops/variable_scope.py", line 806, in
_get_single_variable
constraint=constraint)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py",
line 229, in __init__
constraint=constraint)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py",
line 323, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/ops/variable_scope.py", line 780, in <lambda>
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py",
line 93, in __call__
return array_ops.zeros(shape, dtype)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py",
line 1509, in zeros
output = constant(zero, shape=shape, dtype=dtype, name=name)
File "/usr/lib/python2.7/dist-
packages/tensorflow/python/framework/constant_op.py", line 218, in constant
name=name).outputs[0]
File "/usr/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py",
line 3069, in create_op
op_def=op_def)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py",
line 1579, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-
access
ResourceExhaustedError (see above for traceback): OOM when allocating tensor
of shape [32768,16384] and type float
[[Node: fc1/weights/RMSProp_1/Initializer/zeros = Const[_class=
["loc:@fc1/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape:
[32768,16384] values: [0 0 0]...>,
_device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Errore di segmentazione
答案 0 :(得分:3)
您需要的内存量在很大程度上取决于Tensor的大小,但在您使用的数据类型上也是如此(int32,int64,float16,float32,float64)。
所以问题1:你的Tensor需要32768 x 16384 x memory_size_of_your_datatype
内存(例如,浮点数64的内存占用量是64位,顾名思义,这是8字节,所以在这种情况下你的Tensor需要4.3e9字节或4.3千兆字节)
因此,如果精度损失不会过多地损害您的精度,那么减少内存消耗的一种简单方法就是从float64到float32甚至float16(分别为1/2和1/4)。
此外,您必须了解AWS实例的总内存是如何组成的,即构成实例的GPU的GPU RAM是什么,这是此处的关键内存。
另外,请查看https://www.tensorflow.org/api_docs/python/tf/profiler/Profiler
修改强> 您可以将tf.ConfigProto()传递给您的tf.Session(config = ...),您可以通过它指定GPU使用情况。
特别是,请查看allow_growth
,allow_soft_placement
,per_process_gpu_memory_fraction
选项
(特别是最后一个应该帮助你)