无法识别平台GPU的NUMA节点

时间:2019-04-04 08:26:01

标签: python tensorflow keras

我试图让Tensorflow在我的机器上启动,但是我总是被困在“无法识别NUMA节点”错误消息中。

我使用的是Conda环境:

  • tensorflow-gpu 1.12.0
  • cudatoolkit 9.0
  • cudnn 7.1.2
  • nvidia-smi说:驱动程序版本418.43,CUDA版本10.1

这是错误代码:

>>> import tensorflow as tf
>>> tf.Session()
2019-04-04 09:56:59.851321: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-04 09:56:59.950066: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2019-04-04 09:56:59.950762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.0845
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.84GiB
2019-04-04 09:56:59.950794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-04 09:59:45.338767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-04 09:59:45.338799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-04-04 09:59:45.338810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-04-04 09:59:45.339017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1193] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

不幸的是,我不知道该如何处理错误代码。

1 个答案:

答案 0 :(得分:0)

我可以在新的conda环境中修复它:

conda create --name tf python=3
conda activate tf
conda install cudatoolkit=9.0 tensorflow-gpu=1.11.0

兼容的CUDA / TF组合表可用here。 在我的情况下,cudatoolkit = 9.0和tensorflow-gpu = 1.12的组合莫名其妙地导致了std :: bad_alloc错误。 但是,cudatoolkit = 9.0和tensorflow-gpu = 1.11.0可以正常工作。

相关问题