我正在尝试使用一个名为DeepLabCut的开源项目来运行tensorflow以进行自动视频跟踪,并且遇到了CUDA_ERROR_OUT_OF_MEMORY:开始训练时出现内存不足错误。我已经设置了allocate_growth = True,并且观看nvidia-smi显示我在得到错误的时候从未使用过完整的12GB GPU内存。我从错误日志和分配统计信息中想知道,是否有某些事情会阻止张量流使用更多的GPU。
系统信息:
操作系统:Linux Centos7
GPU:Tesla-P100,12GB内存
CUDA版本:10.0
驱动程序版本:410.129
Python版本:3.6
Tensorflow版本:1.14
nvidia-smi输出显示错误期间的最大GPU使用率:
Fri Apr 24 09:19:21 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.129 Driver Version: 410.129 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:81:00.0 Off | 0 |
| N/A 74C P0 46W / 250W | 3545MiB / 12198MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 17263 C ...y/.conda/envs/dlc-ubuntu-GPU/bin/python 3535MiB |
+-----------------------------------------------------------------------------+
培训配置和错误:
TRAIN
Config:
{'all_joints': [[0], [1], [2], [3]],
'all_joints_names': ['bodypart1', 'bodypart2', 'bodypart3', 'objectA'],
'batch_size': 1,
'bottomheight': 400,
'crop': True,
'crop_pad': 0,
'cropratio': 0.4,
'dataset': 'training-datasets/iteration-0/UnaugmentedDataSet_TESTApr27/TEST_Alex80shuffle1.mat',
'dataset_type': 'default',
'deterministic': False,
'display_iters': 2,
'fg_fraction': 0.25,
'global_scale': 0.8,
'init_weights': '/home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/models/pretrained/resnet_v1_50.ckpt',
'intermediate_supervision': False,
'intermediate_supervision_layer': 12,
'leftwidth': 400,
'location_refinement': True,
'locref_huber_loss': True,
'locref_loss_weight': 0.05,
'locref_stdev': 7.2801,
'log_dir': 'log',
'max_input_size': 1500,
'mean_pixel': [123.68, 116.779, 103.939],
'metadataset': 'training-datasets/iteration-0/UnaugmentedDataSet_TESTApr27/Documentation_data-TEST_80shuffle1.pickle',
'min_input_size': 64,
'minsize': 100,
'mirror': False,
'multi_step': [[0.001, 5]],
'net_type': 'resnet_50',
'num_joints': 4,
'optimizer': 'sgd',
'pos_dist_thresh': 17,
'project_path': '/home/melody/DeepLabCut-master/examples/TEST-Alex-2020-04-27',
'regularize': False,
'rightwidth': 400,
'save_iters': 5,
'scale_jitter_lo': 0.5,
'scale_jitter_up': 1.25,
'scoremap_dir': 'test',
'shuffle': True,
'snapshot_prefix': '/home/melody/DeepLabCut-master/examples/TEST-Alex-2020-04-27/dlc-models/iteration-0/TESTApr27-trainset80shuffle1/train/snapshot',
'stride': 8.0,
'topheight': 400,
'weigh_negatives': False,
'weigh_only_present_joints': False,
'weigh_part_predictions': False,
'weight_decay': 0.0001}
Switching batchsize to 1, as default/tensorpack/deterministic loaders do not support batches >1. Use imgaug loader.
Starting with standard pose-dataset loader.
Initializing ResNet
WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:62: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:160: The name tf.losses.sigmoid_cross_entropy is deprecated. Please use tf.compat.v1.losses.sigmoid_cross_entropy instead.
WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/losses.py:38: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
Loading ImageNet-pretrained resnet_50
WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/train.py:143: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2020-04-27 16:32:37.986082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:81:00.0
2020-04-27 16:32:37.986229: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-04-27 16:32:37.986258: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-04-27 16:32:37.986280: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-04-27 16:32:37.986301: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-04-27 16:32:37.986326: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-04-27 16:32:37.986347: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-04-27 16:32:37.986371: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-04-27 16:32:37.987634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-04-27 16:32:37.987685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-27 16:32:37.987698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-04-27 16:32:37.987708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-04-27 16:32:37.989186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11312 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:81:00.0, compute capability: 6.0)
WARNING:tensorflow:From /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/models/pretrained/resnet_v1_50.ckpt
Training parameter:
{'stride': 8.0, 'weigh_part_predictions': False, 'weigh_negatives': False, 'fg_fraction': 0.25, 'weigh_only_present_joints': False, 'mean_pixel': [123.68, 116.779, 103.939], 'shuffle': True, 'snapshot_prefix': '/home/melody/DeepLabCut-master/examples/TEST-Alex-2020-04-27/dlc-models/iteration-0/TESTApr27-trainset80shuffle1/train/snapshot', 'log_dir': 'log', 'global_scale': 0.8, 'location_refinement': True, 'locref_stdev': 7.2801, 'locref_loss_weight': 0.05, 'locref_huber_loss': True, 'optimizer': 'sgd', 'intermediate_supervision': False, 'intermediate_supervision_layer': 12, 'regularize': False, 'weight_decay': 0.0001, 'mirror': False, 'crop_pad': 0, 'scoremap_dir': 'test', 'batch_size': 1, 'dataset_type': 'default', 'deterministic': False, 'crop': True, 'cropratio': 0.4, 'minsize': 100, 'leftwidth': 400, 'rightwidth': 400, 'topheight': 400, 'bottomheight': 400, 'all_joints': [[0], [1], [2], [3]], 'all_joints_names': ['bodypart1', 'bodypart2', 'bodypart3', 'objectA'], 'dataset': 'training-datasets/iteration-0/UnaugmentedDataSet_TESTApr27/TEST_Alex80shuffle1.mat', 'display_iters': 2, 'init_weights': '/home/melody/.conda/envs/dlc-ubuntu-GPU/lib/python3.6/site-packages/deeplabcut/pose_estimation_tensorflow/models/pretrained/resnet_v1_50.ckpt', 'max_input_size': 1500, 'metadataset': 'training-datasets/iteration-0/UnaugmentedDataSet_TESTApr27/Documentation_data-TEST_80shuffle1.pickle', 'min_input_size': 64, 'multi_step': [[0.001, 5]], 'net_type': 'resnet_50', 'num_joints': 4, 'pos_dist_thresh': 17, 'project_path': '/home/melody/DeepLabCut-master/examples/TEST-Alex-2020-04-27', 'save_iters': 5, 'scale_jitter_lo': 0.5, 'scale_jitter_up': 1.25, 'output_stride': 16, 'deconvolutionstride': 2}
Starting training....
2020-04-27 16:32:47.058174: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2020-04-27 16:32:47.192130: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-04-27 16:32:48.124320: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124395: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 3.60G (3865470464 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124426: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 3.24G (3478923264 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124442: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 2.92G (3131030784 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124456: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 2.62G (2817927680 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124472: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 2.36G (2536134912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124486: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 2.12G (2282521344 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.124529: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-27 16:32:48.127605: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.127633: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 83.74MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-27 16:32:48.127675: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.127695: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-27 16:32:48.133083: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.133108: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 687.94MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-27 16:32:48.133283: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:48.133313: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:58.133555: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:58.133663: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-27 16:32:58.133744: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 37.22MiB (rounded to 39024640).
出现这些错误后,我得到了分配摘要-在最后的统计信息中,我注意到InUse和MaxInUse值远低于GPU的极限。是否可以阻止张量流使用GPU的更多可用内存?
2020-04-27 16:32:58.133744: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 37.22MiB (rounded to 39024640). Current allocation summary follows.
2020-04-27 16:32:58.133902: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (256): Total Chunks: 163, Chunks in use: 163. 40.8KiB allocated for chunks. 40.8KiB in use in bin. 19.7KiB client-requested in use in bin.
2020-04-27 16:32:58.133953: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (512): Total Chunks: 88, Chunks in use: 88. 44.0KiB allocated for chunks. 44.0KiB in use in bin. 44.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134012: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1024): Total Chunks: 177, Chunks in use: 177. 177.2KiB allocated for chunks. 177.2KiB in use in bin. 177.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134048: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2048): Total Chunks: 121, Chunks in use: 121. 242.0KiB allocated for chunks. 242.0KiB in use in bin. 242.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134096: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4096): Total Chunks: 77, Chunks in use: 77. 308.0KiB allocated for chunks. 308.0KiB in use in bin. 308.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134161: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8192): Total Chunks: 44, Chunks in use: 44. 352.0KiB allocated for chunks. 352.0KiB in use in bin. 352.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134208: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16384): Total Chunks: 3, Chunks in use: 3. 48.0KiB allocated for chunks. 48.0KiB in use in bin. 48.0KiB client-requested in use in bin.
2020-04-27 16:32:58.134252: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (32768): Total Chunks: 3, Chunks in use: 3. 110.2KiB allocated for chunks. 110.2KiB in use in bin. 110.2KiB client-requested in use in bin.
2020-04-27 16:32:58.134297: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (65536): Total Chunks: 18, Chunks in use: 18. 1.12MiB allocated for chunks. 1.12MiB in use in bin. 1.12MiB client-requested in use in bin.
2020-04-27 16:32:58.134335: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (131072): Total Chunks: 13, Chunks in use: 13. 1.79MiB allocated for chunks. 1.79MiB in use in bin. 1.79MiB client-requested in use in bin.
2020-04-27 16:32:58.134372: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (262144): Total Chunks: 26, Chunks in use: 26. 6.69MiB allocated for chunks. 6.69MiB in use in bin. 6.69MiB client-requested in use in bin.
2020-04-27 16:32:58.134411: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (524288): Total Chunks: 21, Chunks in use: 21. 11.49MiB allocated for chunks. 11.49MiB in use in bin. 11.44MiB client-requested in use in bin.
2020-04-27 16:32:58.134450: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1048576): Total Chunks: 33, Chunks in use: 33. 33.00MiB allocated for chunks. 33.00MiB in use in bin. 33.00MiB client-requested in use in bin.
2020-04-27 16:32:58.134493: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2097152): Total Chunks: 23, Chunks in use: 23. 50.25MiB allocated for chunks. 50.25MiB in use in bin. 50.25MiB client-requested in use in bin.
2020-04-27 16:32:58.134534: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4194304): Total Chunks: 19, Chunks in use: 17. 86.87MiB allocated for chunks. 73.21MiB in use in bin. 69.28MiB client-requested in use in bin.
2020-04-27 16:32:58.134575: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8388608): Total Chunks: 17, Chunks in use: 16. 160.01MiB allocated for chunks. 150.71MiB in use in bin. 142.52MiB client-requested in use in bin.
2020-04-27 16:32:58.134614: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16777216): Total Chunks: 1, Chunks in use: 1. 16.00MiB allocated for chunks. 16.00MiB in use in bin. 9.00MiB client-requested in use in bin.
2020-04-27 16:32:58.134654: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (33554432): Total Chunks: 4, Chunks in use: 4. 148.49MiB allocated for chunks. 148.49MiB in use in bin. 148.49MiB client-requested in use in bin.
2020-04-27 16:32:58.134693: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-27 16:32:58.134732: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-27 16:32:58.134774: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (268435456): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-27 16:32:58.134816: I tensorflow/core/common_runtime/bfc_allocator.cc:780] Bin for 37.22MiB was 32.00MiB, Chunk State:
2020-04-27 16:32:58.134840: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 268435456
...list of all in-use chunks...
2020-04-27 16:32:58.171559: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 494.04MiB
2020-04-27 16:32:58.171584: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 542113792 memory_limit_: 11861498266 available bytes: 11319384474 curr_region_allocation_bytes_: 4294967296
2020-04-27 16:32:58.171616: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit: 11861498266
InUse: 518035968
MaxInUse: 527939584
NumAllocs: 1157
MaxAllocSize: 177703936
2020-04-27 16:32:58.171763: W tensorflow/core/common_runtime/bfc_allocator.cc:319] *******_*************************_*****************************************************************x
2020-04-27 16:32:58.171820: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:486 : Resource exhausted: OOM when allocating tensor with shape[1,256,185,206] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc