Question

我想使用keras tensorflow GPU后端训练CNN模型进行图像分类。我已经检查过并且tensorflow能够检测GPU。但是keras没有使用GPU来训练模型。任务管理器还指出，在训练模型时，CPU利用率为100％，GPU为0％。

我已经安装了

Visual Studio Community 2017
Python 3.7.3
CUDA 10.0
Cudnn 7.6
Anaconda

我正在使用Windows 10 64位，GPU 1050 GTX 4GB，CPU英特尔i5第7代。

要安装tensorflow GPU，我使用了以下命令

conda create --name tf_gpu tensorflow-gpu

我还尝试了以下3种方法来强制GPU进行训练

with tensorflow.device('/gpu:0'):
    #code

from keras import backend
assert len(backend.tensorflow_backend._get_available_gpus()) > 0
     #code

from keras import backend as K
K.tensorflow_backend._get_available_gpus()
     #code

我已在虚拟环境中安装的软件包

# packages in environment at C:\Users\Sreenivasa Reddy\Anaconda3\envs\tf_gpu:
#
# Name                    Version                   Build  Channel
_tflow_select             2.1.0                       gpu
absl-py                   0.7.1                    py37_0
alabaster                 0.7.12                   py37_0
asn1crypto                0.24.0                   py37_0
astor                     0.7.1                    py37_0
astroid                   2.2.5                    py37_0
attrs                     19.1.0                   py37_1
babel                     2.7.0                      py_0
backcall                  0.1.0                    py37_0
blas                      1.0                         mkl
bleach                    3.1.0                    py37_0
ca-certificates           2019.5.15                     0
certifi                   2019.6.16                py37_0
cffi                      1.12.3           py37h7a1dbc1_0
chardet                   3.0.4                    py37_1
cloudpickle               1.2.1                      py_0
colorama                  0.4.1                    py37_0
cryptography              2.7              py37h7a1dbc1_0
cudatoolkit               10.0.130                      0
cudnn                     7.6.0                cuda10.0_0
decorator                 4.4.0                    py37_1
defusedxml                0.6.0                      py_0
docutils                  0.14                     py37_0
entrypoints               0.3                      py37_0
freetype                  2.9.1                ha9979f8_1
gast                      0.2.2                    py37_0
grpcio                    1.16.1           py37h351948d_1
h5py                      2.9.0            py37h5e291fa_0
hdf5                      1.10.4               h7ebc959_0
icc_rt                    2019.0.0             h0cc432a_1
icu                       58.2                 ha66f8fd_1
idna                      2.8                      py37_0
imagesize                 1.1.0                    py37_0
intel-openmp              2019.4                      245
ipykernel                 5.1.1            py37h39e3cac_0
ipython                   7.6.1            py37h39e3cac_0
ipython_genutils          0.2.0                    py37_0
isort                     4.3.21                   py37_0
jedi                      0.13.3                   py37_0
jinja2                    2.10.1                   py37_0
jpeg                      9b                   hb83a4c4_2
jsonschema                3.0.1                    py37_0
jupyter_client            5.3.1                      py_0
jupyter_core              4.5.0                      py_0
Keras                     2.2.4                     <pip>
keras-applications        1.0.8                      py_0
keras-preprocessing       1.1.0                      py_1
keyring                   18.0.0                   py37_0
lazy-object-proxy         1.4.1            py37he774522_0
libpng                    1.6.37               h2a8f88b_0
libprotobuf               3.8.0                h7bd577a_0
libsodium                 1.0.16               h9d3ae62_0
libtiff                   4.0.10               hb898794_2
markdown                  3.1.1                    py37_0
markupsafe                1.1.1            py37he774522_0
mccabe                    0.6.1                    py37_1
mistune                   0.8.4            py37he774522_0
mkl                       2019.4                      245
mkl_fft                   1.0.12           py37h14836fe_0
mkl_random                1.0.2            py37h343c172_0
mock                      3.0.5                    py37_0
nbconvert                 5.5.0                      py_0
nbformat                  4.4.0                    py37_0
numpy                     1.16.4           py37h19fb1c0_0
numpy-base                1.16.4           py37hc3f5095_0
numpydoc                  0.9.1                      py_0
olefile                   0.46                     py37_0
openssl                   1.1.1c               he774522_1
packaging                 19.0                     py37_0
pandoc                    2.2.3.2                       0
pandocfilters             1.4.2                    py37_1
parso                     0.5.0                      py_0
pickleshare               0.7.5                    py37_0
pillow                    6.1.0            py37hdc69c19_0
pip                       19.1.1                   py37_0
prompt_toolkit            2.0.9                    py37_0
protobuf                  3.8.0            py37h33f27b4_0
psutil                    5.6.3            py37he774522_0
pycodestyle               2.5.0                    py37_0
pycparser                 2.19                     py37_0
pyflakes                  2.1.1                    py37_0
pygments                  2.4.2                      py_0
pylint                    2.3.1                    py37_0
pyopenssl                 19.0.0                   py37_0
pyparsing                 2.4.0                      py_0
pyqt                      5.9.2            py37h6538335_2
pyreadline                2.1                      py37_1
pyrsistent                0.14.11          py37he774522_0
pysocks                   1.7.0                    py37_0
python                    3.7.3                h8c8aaf0_1
python-dateutil           2.8.0                    py37_0
pytz                      2019.1                     py_0
pywin32                   223              py37hfa6e2cd_1
PyYAML                    5.1.1                     <pip>
pyzmq                     18.0.0           py37ha925a31_0
qt                        5.9.7            vc14h73c81de_0
qtawesome                 0.5.7                    py37_1
qtconsole                 4.5.1                      py_0
qtpy                      1.8.0                      py_0
requests                  2.22.0                   py37_0
rope                      0.14.0                     py_0
scipy                     1.2.1            py37h29ff71c_0
setuptools                41.0.1                   py37_0
sip                       4.19.8           py37h6538335_0
six                       1.12.0                   py37_0
snowballstemmer           1.9.0                      py_0
sphinx                    2.1.2                      py_0
sphinxcontrib-applehelp   1.0.1                      py_0
sphinxcontrib-devhelp     1.0.1                      py_0
sphinxcontrib-htmlhelp    1.0.2                      py_0
sphinxcontrib-jsmath      1.0.1                      py_0
sphinxcontrib-qthelp      1.0.2                      py_0
sphinxcontrib-serializinghtml 1.1.3                      py_0
spyder                    3.3.6                    py37_0
spyder-kernels            0.5.1                    py37_0
sqlite                    3.29.0               he774522_0
tensorboard               1.13.1           py37h33f27b4_0
tensorflow                1.13.1          gpu_py37h83e5d6a_0
tensorflow-base           1.13.1          gpu_py37h871c8ca_0
tensorflow-estimator      1.13.0                     py_0
tensorflow-gpu            1.13.1               h0d30ee6_0
termcolor                 1.1.0                    py37_1
testpath                  0.4.2                    py37_0
tk                        8.6.8                hfa6e2cd_0
tornado                   6.0.3            py37he774522_0
traitlets                 4.3.2                    py37_0
urllib3                   1.24.2                   py37_0
vc                        14.1                 h0510ff6_4
vs2015_runtime            14.15.26706          h3a45250_4
wcwidth                   0.1.7                    py37_0
webencodings              0.5.1                    py37_1
werkzeug                  0.15.4                     py_0
wheel                     0.33.4                   py37_0
win_inet_pton             1.1.0                    py37_0
wincertstore              0.2                      py37_0
wrapt                     1.11.2           py37he774522_0
xz                        5.2.4                h2fa13f4_4
zeromq                    4.3.1                h33f27b4_3
zlib                      1.2.11               h62dcd97_3
zstd                      1.3.7                h508b16e_0

要检查tensorflow是否检测到GPU

Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
2019-07-22 17:05:26.706907: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-07-22 17:05:26.916585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.30GiB
2019-07-22 17:05:26.923097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-07-22 17:05:27.594264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-22 17:05:27.598321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-07-22 17:05:27.600418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-07-22 17:05:27.602687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 3011 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17686286348873888351
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3157432729
locality {
  bus_id: 1
  links {
  }
}
incarnation: 5873520528294819841
physical_device_desc: "device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

我的keras代码

import tensorflow as tf
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPool2D
from keras.layers import Flatten
from keras.layers import Dense
from keras import backend as K
K.tensorflow_backend._get_available_gpus()
classifier=Sequential()
    classifier.add(Convolution2D(32,3,3,input_shape=(32,32,3),activation='relu'))
    classifier.add(MaxPool2D(pool_size=(2,2)))
    classifier.add(Convolution2D(32,3,3,activation='relu'))
    classifier.add(MaxPool2D(pool_size=(2,2)))
    classifier.add(Convolution2D(64,3,3,activation='relu'))
    classifier.add(MaxPool2D(pool_size=(2,2)))
    classifier.add(Flatten())
    classifier.add(Dense(output_dim=128, activation='relu'))
    classifier.add(Dense(output_dim=1, activation='sigmoid'))
    classifier.compile(optimizer='adam',loss='binary_crossentropy', metrics=['accuracy'])

    from keras.preprocessing.image import ImageDataGenerator
    train_datagen = ImageDataGenerator(
            rescale=1./255,
            shear_range=0.2,
            zoom_range=0.2,
            horizontal_flip=True)

    test_datagen = ImageDataGenerator(rescale=1./255)

    training_set = train_datagen.flow_from_directory(
            'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/training_set',
            target_size=(32, 32),
            batch_size=32,
            class_mode='binary')

    test_set = test_datagen.flow_from_directory(
            'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/test_set',
            target_size=(32, 32),
            batch_size=32,
            class_mode='binary')

    classifier.fit_generator(
            training_set,
            steps_per_epoch=8000,
            epochs=25,
            validation_data=test_set,
            validation_steps=2000)

iPython控制台中的输出

import tensorflow as tf
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPool2D
from keras.layers import Flatten
from keras.layers import Dense

from keras import backend as K
K.tensorflow_backend._get_available_gpus()
Out[15]: ['/job:localhost/replica:0/task:0/device:GPU:0']

classifier=Sequential()
classifier.add(Convolution2D(32,3,3,input_shape=(32,32,3),activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Convolution2D(32,3,3,activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Convolution2D(64,3,3,activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Flatten())
classifier.add(Dense(output_dim=128, activation='relu'))
classifier.add(Dense(output_dim=1, activation='sigmoid'))
classifier.compile(optimizer='adam',loss='binary_crossentropy', metrics=['accuracy'])

from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory(
        'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/training_set',
        target_size=(32, 32),
        batch_size=32,
        class_mode='binary')

test_set = test_datagen.flow_from_directory(
        'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/test_set',
        target_size=(32, 32),
        batch_size=32,
        class_mode='binary')

classifier.fit_generator(
        training_set,
        steps_per_epoch=8000,
        epochs=25,
        validation_data=test_set,
        validation_steps=2000)
__main__:2: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(32, (3, 3), input_shape=(32, 32, 3..., activation="relu")`
__main__:4: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(32, (3, 3), activation="relu")`
__main__:6: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(64, (3, 3), activation="relu")`
__main__:9: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="relu", units=128)`
__main__:10: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="sigmoid", units=1)`
Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/25
 782/8000 [=>............................] - ETA: 17:38 - loss: 0.6328 - acc: 0.6310

注意：我运行了一段时间以从iPython控制台复制代码段后便停止了内核

编辑：我训练了RNN和ANN模型，当我在训练时检查任务管理器时，CUDA利用率约为35％，但对于CNN模型，CUDA利用率为2％。 CUDA的35％普及率不低吗？为什么CNN不使用35％

EDIT2：奇怪的是，当我增加批量大小时，模型训练非常缓慢，当我减小批量大小时（即，当我将其设为1时），模型训练得更快了，对此有什么解释吗？

Answer 1

我在这里问我的问题，因为我尚未获得发表评论的特权：/

您提到您尝试了不同的方法：

“带有tensorflow.device（'/ gpu：0'）：#code ...

在您发布的代码中，我看不到它们或使用gpu的其他方法，但是我认为您曾经使用过gpu来获得上述输出？

如果使用这些方法会发生什么？仍然只使用GPU还是报错？

您可以尝试这样的方法并发布结果吗？

# Creates a graph.

with tf.device('/gpu:0'):

    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')

    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')

    c = tf.matmul(a, b)

# Creates a session with log_device_placement set to True.

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

# Runs the op.

print(sess.run(c))

如本例中所述：https://dzone.com/articles/how-to-train-tensorflow-models-using-gpus

即使安装了Tensorflow GPU，Keras深度学习也无法在GPU上运行

1 个答案: