Question

我正在尝试使用一个名为DeepLabCut的开源项目来运行tensorflow以进行自动视频跟踪，并且遇到了CUDA_ERROR_OUT_OF_MEMORY：开始训练时出现内存不足错误。我已经设置了allocate_growth = True，并且观看nvidia-smi显示我在得到错误的时候从未使用过完整的12GB GPU内存。我从错误日志和分配统计信息中想知道，是否有某些事情会阻止张量流使用更多的GPU。

系统信息：
操作系统：Linux Centos7
GPU：Tesla-P100，12GB内存
CUDA版本：10.0
驱动程序版本：410.129
Python版本：3.6
Tensorflow版本：1.14

nvidia-smi输出显示错误期间的最大GPU使用率：

Fri Apr 24 09:19:21 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.129      Driver Version: 410.129      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:81:00.0 Off |                    0 |
| N/A   74C    P0    46W / 250W |   3545MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     17263      C   ...y/.conda/envs/dlc-ubuntu-GPU/bin/python  3535MiB |
+-----------------------------------------------------------------------------+

培训配置和错误：

TRAIN
Config:
{'all_joints': [[0], [1], [2], [3]],
 'all_joints_names': ['bodypart1', 'bodypart2', 'bodypart3', 'objectA'],
 'batch_size': 1,
 'bottomheight': 400,
 'crop': True,
 'crop_pad': 0,
 'cropratio': 0.4,
 'dataset': 'training-datasets/iteration-0/UnaugmentedDataSet_TESTApr27/TEST_Alex80shuffle1.mat',
 'dataset_type': 'default',
 'deterministic': False,
 'display_iters': 2,
 'fg_fraction': 0.25,
 'global_scale': 0.8,
 'init_weights': '/home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/models/pretrained/resnet_v1_50.ckpt',
 'intermediate_supervision': False,
 'intermediate_supervision_layer': 12,
 'leftwidth': 400,
 'location_refinement': True,
 'locref_huber_loss': True,
 'locref_loss_weight': 0.05,
 'locref_stdev': 7.2801,
 'log_dir': 'log',
 'max_input_size': 1500,
 'mean_pixel': [123.68, 116.779, 103.939],
 'metadataset': 'training-datasets/iteration-0/UnaugmentedDataSet_TESTApr27/Documentation_data-TEST_80shuffle1.pickle',
 'min_input_size': 64,
 'minsize': 100,
 'mirror': False,
 'multi_step': [[0.001, 5]],
 'net_type': 'resnet_50',
 'num_joints': 4,
 'optimizer': 'sgd',
 'pos_dist_thresh': 17,
 'project_path': '/home/melody/DeepLabCut-master/examples/TEST-Alex-2020-04-27',
 'regularize': False,
 'rightwidth': 400,
 'save_iters': 5,
 'scale_jitter_lo': 0.5,
 'scale_jitter_up': 1.25,
 'scoremap_dir': 'test',
 'shuffle': True,
 'snapshot_prefix': '/home/melody/DeepLabCut-master/examples/TEST-Alex-2020-04-27/dlc-models/iteration-0/TESTApr27-trainset80shuffle1/train/snapshot',
 'stride': 8.0,
 'topheight': 400,
 'weigh_negatives': False,
 'weigh_only_present_joints': False,
 'weigh_part_predictions': False,
 'weight_decay': 0.0001}
Switching batchsize to 1, as default/tensorpack/deterministic loaders do not support batches >1. Use imgaug loader.
Starting with standard pose-dataset loader.
Initializing ResNet
WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:62: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:160: The name tf.losses.sigmoid_cross_entropy is deprecated. Please use tf.compat.v1.losses.sigmoid_cross_entropy instead.

WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/losses.py:38: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
Loading ImageNet-pretrained resnet_50
WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/train.py:143: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2020-04-27 16:32:37.986082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:81:00.0
2020-04-27 16:32:37.986229: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-04-27 16:32:37.986258: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-04-27 16:32:37.986280: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-04-27 16:32:37.986301: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-04-27 16:32:37.986326: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-04-27 16:32:37.986347: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-04-27 16:32:37.986371: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-04-27 16:32:37.987634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-04-27 16:32:37.987685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-27 16:32:37.987698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2020-04-27 16:32:37.987708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2020-04-27 16:32:37.989186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11312 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:81:00.0, compute capability: 6.0)
WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/models/pretrained/resnet_v1_50.ckpt
Training parameter:
{'stride': 8.0, 'weigh_part_predictions': False, 'weigh_negatives': False, 'fg_fraction': 0.25, 'weigh_only_present_joints': False, 'mean_pixel': [123.68, 116.779, 103.939], 'shuffle': True, 'snapshot_prefix': '/home/melody/DeepLabCut-master/examples/TEST-Alex-2020-04-27/dlc-models/iteration-0/TESTApr27-trainset80shuffle1/train/snapshot', 'log_dir': 'log', 'global_scale': 0.8, 'location_refinement': True, 'locref_stdev': 7.2801, 'locref_loss_weight': 0.05, 'locref_huber_loss': True, 'optimizer': 'sgd', 'intermediate_supervision': False, 'intermediate_supervision_layer': 12, 'regularize': False, 'weight_decay': 0.0001, 'mirror': False, 'crop_pad': 0, 'scoremap_dir': 'test', 'batch_size': 1, 'dataset_type': 'default', 'deterministic': False, 'crop': True, 'cropratio': 0.4, 'minsize': 100, 'leftwidth': 400, 'rightwidth': 400, 'topheight': 400, 'bottomheight': 400, 'all_joints': [[0], [1], [2], [3]], 'all_joints_names': ['bodypart1', 'bodypart2', 'bodypart3', 'objectA'], 'dataset': 'training-datasets/iteration-0/UnaugmentedDataSet_TESTApr27/TEST_Alex80shuffle1.mat', 'display_iters': 2, 'init_weights': '/home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/models/pretrained/resnet_v1_50.ckpt', 'max_input_size': 1500, 'metadataset': 'training-datasets/iteration-0/UnaugmentedDataSet_TESTApr27/Documentation_data-TEST_80shuffle1.pickle', 'min_input_size': 64, 'multi_step': [[0.001, 5]], 'net_type': 'resnet_50', 'num_joints': 4, 'pos_dist_thresh': 17, 'project_path': '/home/melody/DeepLabCut-master/examples/TEST-Alex-2020-04-27', 'save_iters': 5, 'scale_jitter_lo': 0.5, 'scale_jitter_up': 1.25, 'output_stride': 16, 'deconvolutionstride': 2}
Starting training....
2020-04-27 16:32:47.058174: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2020-04-27 16:32:47.192130: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-04-27 16:32:48.124320: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124395: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 3.60G (3865470464 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124426: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 3.24G (3478923264 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124442: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 2.92G (3131030784 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124456: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 2.62G (2817927680 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124472: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 2.36G (2536134912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124486: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 2.12G (2282521344 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124529: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-27 16:32:48.127605: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.127633: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 83.74MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-27 16:32:48.127675: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.127695: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-27 16:32:48.133083: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.133108: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 687.94MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-27 16:32:48.133283: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.133313: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:58.133555: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:58.133663: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:58.133744: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 37.22MiB (rounded to 39024640).

出现这些错误后，我得到了分配摘要-在最后的统计信息中，我注意到InUse和MaxInUse值远低于GPU的极限。是否可以阻止张量流使用GPU的更多可用内存？

2020-04-27 16:32:58.133744: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 37.22MiB (rounded to 39024640).  Current allocation summary follows.
2020-04-27 16:32:58.133902: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (256):   Total Chunks: 163, Chunks in use: 163. 40.8KiB allocated for chunks. 40.8KiB in use in bin. 19.7KiB client-requested in use in bin.
2020-04-27 16:32:58.133953: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (512):   Total Chunks: 88, Chunks in use: 88. 44.0KiB allocated for chunks. 44.0KiB in use in bin. 44.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134012: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1024):  Total Chunks: 177, Chunks in use: 177. 177.2KiB allocated for chunks. 177.2KiB in use in bin. 177.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134048: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2048):  Total Chunks: 121, Chunks in use: 121. 242.0KiB allocated for chunks. 242.0KiB in use in bin. 242.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134096: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4096):  Total Chunks: 77, Chunks in use: 77. 308.0KiB allocated for chunks. 308.0KiB in use in bin. 308.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134161: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8192):  Total Chunks: 44, Chunks in use: 44. 352.0KiB allocated for chunks. 352.0KiB in use in bin. 352.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134208: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16384):     Total Chunks: 3, Chunks in use: 3. 48.0KiB allocated for chunks. 48.0KiB in use in bin. 48.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134252: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (32768):     Total Chunks: 3, Chunks in use: 3. 110.2KiB allocated for chunks. 110.2KiB in use in bin. 110.2KiB client-requested in use in bin.
2020-04-27 16:32:58.134297: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (65536):     Total Chunks: 18, Chunks in use: 18. 1.12MiB allocated for chunks. 1.12MiB in use in bin. 1.12MiB client-requested in use in bin.
2020-04-27 16:32:58.134335: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (131072):    Total Chunks: 13, Chunks in use: 13. 1.79MiB allocated for chunks. 1.79MiB in use in bin. 1.79MiB client-requested in use in bin.
2020-04-27 16:32:58.134372: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (262144):    Total Chunks: 26, Chunks in use: 26. 6.69MiB allocated for chunks. 6.69MiB in use in bin. 6.69MiB client-requested in use in bin.
2020-04-27 16:32:58.134411: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (524288):    Total Chunks: 21, Chunks in use: 21. 11.49MiB allocated for chunks. 11.49MiB in use in bin. 11.44MiB client-requested in use in bin.
2020-04-27 16:32:58.134450: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1048576):   Total Chunks: 33, Chunks in use: 33. 33.00MiB allocated for chunks. 33.00MiB in use in bin. 33.00MiB client-requested in use in bin.
2020-04-27 16:32:58.134493: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2097152):   Total Chunks: 23, Chunks in use: 23. 50.25MiB allocated for chunks. 50.25MiB in use in bin. 50.25MiB client-requested in use in bin.
2020-04-27 16:32:58.134534: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4194304):   Total Chunks: 19, Chunks in use: 17. 86.87MiB allocated for chunks. 73.21MiB in use in bin. 69.28MiB client-requested in use in bin.
2020-04-27 16:32:58.134575: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8388608):   Total Chunks: 17, Chunks in use: 16. 160.01MiB allocated for chunks. 150.71MiB in use in bin. 142.52MiB client-requested in use in bin.
2020-04-27 16:32:58.134614: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16777216):  Total Chunks: 1, Chunks in use: 1. 16.00MiB allocated for chunks. 16.00MiB in use in bin. 9.00MiB client-requested in use in bin.
2020-04-27 16:32:58.134654: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (33554432):  Total Chunks: 4, Chunks in use: 4. 148.49MiB allocated for chunks. 148.49MiB in use in bin. 148.49MiB client-requested in use in bin.
2020-04-27 16:32:58.134693: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (67108864):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-27 16:32:58.134732: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-27 16:32:58.134774: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (268435456):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-27 16:32:58.134816: I tensorflow/core/common_runtime/bfc_allocator.cc:780] Bin for 37.22MiB was 32.00MiB, Chunk State: 
2020-04-27 16:32:58.134840: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 268435456
...list of all in-use chunks...
2020-04-27 16:32:58.171559: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 494.04MiB
2020-04-27 16:32:58.171584: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 542113792 memory_limit_: 11861498266 available bytes: 11319384474 curr_region_allocation_bytes_: 4294967296
2020-04-27 16:32:58.171616: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats: 
Limit:                 11861498266
InUse:                   518035968
MaxInUse:                527939584
NumAllocs:                    1157
MaxAllocSize:            177703936

2020-04-27 16:32:58.171763: W tensorflow/core/common_runtime/bfc_allocator.cc:319] *******_*************************_*****************************************************************x
2020-04-27 16:32:58.171820: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:486 : Resource exhausted: OOM when allocating tensor with shape[1,256,185,206] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Tensorflow：开始训练步骤时出现内存不足错误

0 个答案: