我正在建立一个keras模型来运行一些简单的图像识别任务。如果我在原始的Keras做所有事情,我不打击OOM。然而,奇怪的是,当我通过我编写的迷你框架执行此操作时,这非常简单,主要是为了能够跟踪我使用的超参数和设置,我点击了OOM。大多数执行应该与运行原始Keras相同。我在某处猜测我在代码中犯了一些错误。请注意,在我的本地笔记本电脑上运行CPU时,同样的迷你框架没有问题。我想我需要调试。但在此之前,任何人都有任何一般性建议?
以下是我得到的几行错误:
Epoch 1/50
2018-05-18 17:40:27.435366: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-18 17:40:27.435906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:04.0 totalMemory: 11.17GiB freeMemory: 504.38MiB
2018-05-18 17:40:27.435992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-05-18 17:40:27.784517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-18 17:40:27.784675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-05-18 17:40:27.784724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-05-18 17:40:27.785072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 243 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2018-05-18 17:40:38.569609: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 36.00MiB. Current allocation summary follows.
2018-05-18 17:40:38.569702: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (256): Total Chunks: 66, Chunks in use: 66. 16.5KiB allocated for chunks. 16.5KiB in use in bin. 2.3KiB client-requested in use in bin.
2018-05-18 17:40:38.569768: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (512): Total Chunks: 10, Chunks in use: 10. 5.0KiB allocated for chunks. 5.0KiB in use in bin. 5.0KiB client- etc. etc
2018-05-18 17:40:38.573706: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[18432,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
答案 0 :(得分:1)
这是由GPU内存耗尽引起的,因为警告中已清楚这一点。
首先解决方法是,如果可能的话,你可以通过编写这个Config proto并传递给tf.session()
来允许GPU内存增长activityIndicator: {
flex: 1,
justifyContent: 'center',
alignItems: 'center',
height: 80
}
然后将此配置传递给导致此错误的会话。喜欢
$wrapper = new \Artisaninweb\SoapWrapper\SoapWrapper();
$service = $wrapper->add('ServiceName', function ($service) {
$service->wsdl('http://serviceurl')
->cache(WSDL_CACHE_NONE)
->trace(true);
});
$data = [
// ...
];
$loginResponse = $service->call('ServiceName.someMethod', [$data]);
如果这没有帮助,您可以针对导致此错误的特定会话禁用GPU。喜欢这个
# See https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
如果你正在使用keras,你可以通过提取会话来获得keras的后端并应用这些配置。