Question

我已经使用python上的tensorflow训练了一个没有问题的模型。我现在正在尝试将此模型的推断集成到预先存在的OpenGL软件中。但是，我在CUDA_ERROR_OUT_OF_MEMORY期间得到cuInit（即，甚至比加载模型更早，只是在创建会话时）。看来，OpenGL已经占用了一些MiB内存（大约300 MB），如gpustat或nvidia-smi所示。

由于TF和OpenGL都试图访问/分配GPU内存，是否可能存在冲突？以前有人遇到过这个问题吗？我发现谷歌搜索的大多数参考是在模型加载时，而不是在会话/ CUDA初始化。这与OpenGL完全无关，我只是在咆哮错误的树吗？一个简单的TF C ++推理示例可行。任何帮助表示赞赏。

这是tensorflow日志输出，为了完整性：

2018-01-08 12:11:38.321136: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-01-08 12:11:38.379100: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY
2018-01-08 12:11:38.379388: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: rosenblatt
2018-01-08 12:11:38.379413: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: rosenblatt
2018-01-08 12:11:38.379508: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 384.98.0
2018-01-08 12:11:38.380425: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  384.98  Thu Oct 26 15:16:01 PDT 2017 GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)"""
2018-01-08 12:11:38.380481: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.98.0
2018-01-08 12:11:38.380497: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 384.98.0

编辑：删除对OpenGL的所有引用会导致同样的问题，因此它与库之间的冲突无关。

Answer 1

好的，问题是在二进制文件的调试版本中使用了清洁剂。发布版本或没有清理程序的调试版本按预期工作。

带有tensorflow C ++的OpenGL程序给cuInit调用失败：CUDA_ERROR_OUT_OF_MEMORY

1 个答案: