我有一个带有AMI的EC2(p2.xlarge)
深度学习AMI(Ubuntu)5.0版 - ami-7336d50e
预装了最新的深度学习框架二进制文件 在单独的虚拟环境中:MXNet,TensorFlow,Caffe,Caffe2, PyTorch,Keras,Chainer,Theano和CNTK。完全配置NVidia CUDA,cuDNN和NCCL
当我开始我的程序时,我尝试使用keras制作rnn
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally
在卡拉斯开始之后,我有了这个
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:1e.0
Total memory: 11.17GiB
Free memory: 11.10GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0)
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 12639 get requests, put_count=6277 evicted_count=1000 eviction_rate=0.159312 and unsatisfied allocation rate=0.590395
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
但是当程序学习不快时我的macbookpro比我的EC2快,我在每个时期后都有这个警告
tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4156 get requests, put_count=8233 evicted_count=4000 eviction_rate=0.48585 and unsatisfied allocation rate=0.000481232
我安装了karas_gpu和tensorflow_gpu,我使用vm for keras2 with tensorflow
如果我做错了什么你可以告诉我,这样一个简单的小macbook可以比使用这个规范的EC2更快
p2.xlarge(11.75 ECU,4个vCPU,2.7 GHz,E5-2686v4,61Giomémoire,EBS uniquement)
答案 0 :(得分:0)
回复很简单。在EC2 AMI(p2.xlarge)中,使用TensorFlow的gpu是Tesla K80
,这个gpu加速4x~10x cpu,在我的macbook中我有8个cpu。