我正在使用带有16 GPU的amazon EC2进行计算。 当我配置我需要的所有内容并在python中测试它时,发生了一些奇怪的事情。
Follwing是一些实验:
import tensorflow as tf
import time
a=time.time()
hello=tf.constant('hello')
sess=tf.Session()
在上面之后我收到了很长的消息:
2018-01-31 07:10:27.922290: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 07:10:27.922347: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 07:10:27.922360: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 07:10:27.922371: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 07:10:27.922381: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 07:11:05.263488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-31 07:11:05.265392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:0f.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2018-01-31 07:11:05.487461: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x56312fdf3970 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2018-01-31 07:11:05.488072: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-31 07:11:05.489826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:10.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2018-01-31 07:11:05.707955: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x56312fdf7e80 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2018-01-31 07:11:05.708452: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-31 07:11:05.709916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 2 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:11.0
Total memory: 11.17GiB
Free memory: 11.10GiB
一直......
似乎tensorflow正在扫描GPU设备。 但这很慢。我等了5分钟看到上面的东西,后来它一直卡住,直到亚马逊自动断开连接。 之前我在我的实验室服务器上做了同样的事情,有4 tesela k40一切顺利。
有人知道为什么会这样吗?
答案 0 :(得分:0)
经过反复试验,我终于解决了这个问题。 我卸载了所有内容并重新安装了NVIDIA驱动程序deb文件但是使用以下命令安装它:
display: boolean = false;
constructor() {
}
items: MenuItem[];
ngOnInit() {
this.items = [
{
label: 'Chapter 1',
icon: 'fa-file-o',
items: [{
label: 'Chapter1.1',
icon: 'fa-plus',
items: [
{label: 'Chapter1.1.1'},
{label: 'Chapter1.1.2'},
]
},
{label: 'Chapter1.2'},
{label: 'Chapter1.3'}
]
},
{
label: 'Chapter 2',
icon: 'fa-edit',
items: [
{label: 'Chapter 2.1', icon: 'fa-mail-forward'},
{label: 'Chapter 2.2', icon: 'fa-mail-reply'}
]
}
];
}
clicked(event=1) {
console.log("event",event)
this.display=true;
}
}
然后使用Anaconda安装加速和张量流。 稍后根据标准程序安装CUDA。