我想使用keras tensorflow GPU后端训练CNN模型进行图像分类。我已经检查过并且tensorflow能够检测GPU。但是keras没有使用GPU来训练模型。任务管理器还指出,在训练模型时,CPU利用率为100%,GPU为0%。
我已经安装了
我正在使用Windows 10 64位,GPU 1050 GTX 4GB,CPU英特尔i5第7代。
要安装tensorflow GPU,我使用了以下命令
conda create --name tf_gpu tensorflow-gpu
我还尝试了以下3种方法来强制GPU进行训练
with tensorflow.device('/gpu:0'):
#code
from keras import backend
assert len(backend.tensorflow_backend._get_available_gpus()) > 0
#code
from keras import backend as K
K.tensorflow_backend._get_available_gpus()
#code
我已在虚拟环境中安装的软件包
# packages in environment at C:\Users\Sreenivasa Reddy\Anaconda3\envs\tf_gpu:
#
# Name Version Build Channel
_tflow_select 2.1.0 gpu
absl-py 0.7.1 py37_0
alabaster 0.7.12 py37_0
asn1crypto 0.24.0 py37_0
astor 0.7.1 py37_0
astroid 2.2.5 py37_0
attrs 19.1.0 py37_1
babel 2.7.0 py_0
backcall 0.1.0 py37_0
blas 1.0 mkl
bleach 3.1.0 py37_0
ca-certificates 2019.5.15 0
certifi 2019.6.16 py37_0
cffi 1.12.3 py37h7a1dbc1_0
chardet 3.0.4 py37_1
cloudpickle 1.2.1 py_0
colorama 0.4.1 py37_0
cryptography 2.7 py37h7a1dbc1_0
cudatoolkit 10.0.130 0
cudnn 7.6.0 cuda10.0_0
decorator 4.4.0 py37_1
defusedxml 0.6.0 py_0
docutils 0.14 py37_0
entrypoints 0.3 py37_0
freetype 2.9.1 ha9979f8_1
gast 0.2.2 py37_0
grpcio 1.16.1 py37h351948d_1
h5py 2.9.0 py37h5e291fa_0
hdf5 1.10.4 h7ebc959_0
icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha66f8fd_1
idna 2.8 py37_0
imagesize 1.1.0 py37_0
intel-openmp 2019.4 245
ipykernel 5.1.1 py37h39e3cac_0
ipython 7.6.1 py37h39e3cac_0
ipython_genutils 0.2.0 py37_0
isort 4.3.21 py37_0
jedi 0.13.3 py37_0
jinja2 2.10.1 py37_0
jpeg 9b hb83a4c4_2
jsonschema 3.0.1 py37_0
jupyter_client 5.3.1 py_0
jupyter_core 4.5.0 py_0
Keras 2.2.4 <pip>
keras-applications 1.0.8 py_0
keras-preprocessing 1.1.0 py_1
keyring 18.0.0 py37_0
lazy-object-proxy 1.4.1 py37he774522_0
libpng 1.6.37 h2a8f88b_0
libprotobuf 3.8.0 h7bd577a_0
libsodium 1.0.16 h9d3ae62_0
libtiff 4.0.10 hb898794_2
markdown 3.1.1 py37_0
markupsafe 1.1.1 py37he774522_0
mccabe 0.6.1 py37_1
mistune 0.8.4 py37he774522_0
mkl 2019.4 245
mkl_fft 1.0.12 py37h14836fe_0
mkl_random 1.0.2 py37h343c172_0
mock 3.0.5 py37_0
nbconvert 5.5.0 py_0
nbformat 4.4.0 py37_0
numpy 1.16.4 py37h19fb1c0_0
numpy-base 1.16.4 py37hc3f5095_0
numpydoc 0.9.1 py_0
olefile 0.46 py37_0
openssl 1.1.1c he774522_1
packaging 19.0 py37_0
pandoc 2.2.3.2 0
pandocfilters 1.4.2 py37_1
parso 0.5.0 py_0
pickleshare 0.7.5 py37_0
pillow 6.1.0 py37hdc69c19_0
pip 19.1.1 py37_0
prompt_toolkit 2.0.9 py37_0
protobuf 3.8.0 py37h33f27b4_0
psutil 5.6.3 py37he774522_0
pycodestyle 2.5.0 py37_0
pycparser 2.19 py37_0
pyflakes 2.1.1 py37_0
pygments 2.4.2 py_0
pylint 2.3.1 py37_0
pyopenssl 19.0.0 py37_0
pyparsing 2.4.0 py_0
pyqt 5.9.2 py37h6538335_2
pyreadline 2.1 py37_1
pyrsistent 0.14.11 py37he774522_0
pysocks 1.7.0 py37_0
python 3.7.3 h8c8aaf0_1
python-dateutil 2.8.0 py37_0
pytz 2019.1 py_0
pywin32 223 py37hfa6e2cd_1
PyYAML 5.1.1 <pip>
pyzmq 18.0.0 py37ha925a31_0
qt 5.9.7 vc14h73c81de_0
qtawesome 0.5.7 py37_1
qtconsole 4.5.1 py_0
qtpy 1.8.0 py_0
requests 2.22.0 py37_0
rope 0.14.0 py_0
scipy 1.2.1 py37h29ff71c_0
setuptools 41.0.1 py37_0
sip 4.19.8 py37h6538335_0
six 1.12.0 py37_0
snowballstemmer 1.9.0 py_0
sphinx 2.1.2 py_0
sphinxcontrib-applehelp 1.0.1 py_0
sphinxcontrib-devhelp 1.0.1 py_0
sphinxcontrib-htmlhelp 1.0.2 py_0
sphinxcontrib-jsmath 1.0.1 py_0
sphinxcontrib-qthelp 1.0.2 py_0
sphinxcontrib-serializinghtml 1.1.3 py_0
spyder 3.3.6 py37_0
spyder-kernels 0.5.1 py37_0
sqlite 3.29.0 he774522_0
tensorboard 1.13.1 py37h33f27b4_0
tensorflow 1.13.1 gpu_py37h83e5d6a_0
tensorflow-base 1.13.1 gpu_py37h871c8ca_0
tensorflow-estimator 1.13.0 py_0
tensorflow-gpu 1.13.1 h0d30ee6_0
termcolor 1.1.0 py37_1
testpath 0.4.2 py37_0
tk 8.6.8 hfa6e2cd_0
tornado 6.0.3 py37he774522_0
traitlets 4.3.2 py37_0
urllib3 1.24.2 py37_0
vc 14.1 h0510ff6_4
vs2015_runtime 14.15.26706 h3a45250_4
wcwidth 0.1.7 py37_0
webencodings 0.5.1 py37_1
werkzeug 0.15.4 py_0
wheel 0.33.4 py37_0
win_inet_pton 1.1.0 py37_0
wincertstore 0.2 py37_0
wrapt 1.11.2 py37he774522_0
xz 5.2.4 h2fa13f4_4
zeromq 4.3.1 h33f27b4_3
zlib 1.2.11 h62dcd97_3
zstd 1.3.7 h508b16e_0
要检查tensorflow是否检测到GPU
Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
2019-07-22 17:05:26.706907: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-07-22 17:05:26.916585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.30GiB
2019-07-22 17:05:26.923097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-07-22 17:05:27.594264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-22 17:05:27.598321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-07-22 17:05:27.600418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-07-22 17:05:27.602687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 3011 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17686286348873888351
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3157432729
locality {
bus_id: 1
links {
}
}
incarnation: 5873520528294819841
physical_device_desc: "device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1"
]
我的keras代码
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPool2D
from keras.layers import Flatten
from keras.layers import Dense
from keras import backend as K
K.tensorflow_backend._get_available_gpus()
classifier=Sequential()
classifier.add(Convolution2D(32,3,3,input_shape=(32,32,3),activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Convolution2D(32,3,3,activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Convolution2D(64,3,3,activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Flatten())
classifier.add(Dense(output_dim=128, activation='relu'))
classifier.add(Dense(output_dim=1, activation='sigmoid'))
classifier.compile(optimizer='adam',loss='binary_crossentropy', metrics=['accuracy'])
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory(
'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/training_set',
target_size=(32, 32),
batch_size=32,
class_mode='binary')
test_set = test_datagen.flow_from_directory(
'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/test_set',
target_size=(32, 32),
batch_size=32,
class_mode='binary')
classifier.fit_generator(
training_set,
steps_per_epoch=8000,
epochs=25,
validation_data=test_set,
validation_steps=2000)
iPython控制台中的输出
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPool2D
from keras.layers import Flatten
from keras.layers import Dense
from keras import backend as K
K.tensorflow_backend._get_available_gpus()
Out[15]: ['/job:localhost/replica:0/task:0/device:GPU:0']
classifier=Sequential()
classifier.add(Convolution2D(32,3,3,input_shape=(32,32,3),activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Convolution2D(32,3,3,activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Convolution2D(64,3,3,activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Flatten())
classifier.add(Dense(output_dim=128, activation='relu'))
classifier.add(Dense(output_dim=1, activation='sigmoid'))
classifier.compile(optimizer='adam',loss='binary_crossentropy', metrics=['accuracy'])
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory(
'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/training_set',
target_size=(32, 32),
batch_size=32,
class_mode='binary')
test_set = test_datagen.flow_from_directory(
'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/test_set',
target_size=(32, 32),
batch_size=32,
class_mode='binary')
classifier.fit_generator(
training_set,
steps_per_epoch=8000,
epochs=25,
validation_data=test_set,
validation_steps=2000)
__main__:2: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(32, (3, 3), input_shape=(32, 32, 3..., activation="relu")`
__main__:4: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(32, (3, 3), activation="relu")`
__main__:6: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(64, (3, 3), activation="relu")`
__main__:9: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="relu", units=128)`
__main__:10: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="sigmoid", units=1)`
Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/25
782/8000 [=>............................] - ETA: 17:38 - loss: 0.6328 - acc: 0.6310
注意:我运行了一段时间以从iPython控制台复制代码段后便停止了内核
编辑:我训练了RNN和ANN模型,当我在训练时检查任务管理器时,CUDA利用率约为35%,但对于CNN模型,CUDA利用率为2%。 CUDA的35%普及率不低吗?为什么CNN不使用35%
EDIT2:奇怪的是,当我增加批量大小时,模型训练非常缓慢,当我减小批量大小时(即,当我将其设为1时),模型训练得更快了,对此有什么解释吗?
答案 0 :(得分:1)
我在这里问我的问题,因为我尚未获得发表评论的特权:/
您提到您尝试了不同的方法:
“带有tensorflow.device('/ gpu:0'):#code ...
在您发布的代码中,我看不到它们或使用gpu的其他方法,但是我认为您曾经使用过gpu来获得上述输出?
如果使用这些方法会发生什么?仍然只使用GPU还是报错?
您可以尝试这样的方法并发布结果吗?
# Creates a graph.
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
如本例中所述:https://dzone.com/articles/how-to-train-tensorflow-models-using-gpus