我正在使用带有Tensorflow后端的Keras。当运行nvidia-smi时,我可以看到python在GPU上分配内存,但它似乎没有使用它。此外,计算运行速度非常慢(~300秒而非~15秒)。我使用的是GTX 980。
我的Python 3代码:
# coding: utf-8
# ## Set up Libraries
import keras as K
import numpy as np
from keras.layers import Activation, Dense, Flatten, Lambda
from keras.models import Sequential
from keras.optimizers import Adam
from keras.preprocessing.image import ImageDataGenerator
# ## Code
def one_hot(x):
matrix = np.zeros([x.size, np.max(x) + 1])
matrix[np.arange(x.size), x] = 1
return matrix
# ## Prepare Data
from keras.datasets import mnist
(X_train, Y_train_raw), (X_test, Y_test_raw) = mnist.load_data()
X_train = np.expand_dims(X_train, 3)
Y_train_raw = np.expand_dims(Y_train_raw, 3)
X_test = np.expand_dims(X_test, 3)
Y_test_raw = np.expand_dims(Y_test_raw, 3)
mnist_mean = X_train.mean().astype(np.float32)
mnist_stddev = X_train.std().astype(np.float32)
def normalize_mnist_input(x):
return (x - mnist_mean) / mnist_stddev
Y_train = one_hot(Y_train_raw)
Y_test = one_hot(Y_test_raw)
X_valid = X_train[50000:]
Y_valid = X_train[50000:]
X_train = X_train[0:50000]
Y_train = Y_train[0:50000]
# ## Fit Simple Model
def linear_model():
model = Sequential([
Lambda(normalize_mnist_input, input_shape=(28, 28, 1)),
Flatten(),
Dense(10, activation="softmax")
])
model.compile(Adam(), loss="categorical_crossentropy", metrics=['accuracy'])
return model
linear_model = linear_model()
image_generator = ImageDataGenerator()
train_batches = image_generator.flow(X_train, Y_train, batch_size=64)
test_batches = image_generator.flow(X_test, Y_test, batch_size=64)
linear_model.fit_generator(train_batches, train_batches.n,
validation_data=test_batches, validation_steps=test_batches.n,
epochs=1)
当我运行此测试脚本时,它使用GPU:
import tensorflow as tf
# Creates a graph.
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
我也在使用nvidia-docker,它在运行测试泊坞窗图像时有效。
我的Dockerfile看起来基本上是这样的:
FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
### basic utilities
RUN apt-get update && \
apt-get --assume-yes upgrade && \
apt-get --assume-yes install binutils build-essential curl gcc git g++ \
libfreetype6-dev libpng12-dev libzmq3-dev pkg-config make nano rsync \
software-properties-common unzip wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
### anaconda & tensorflow
RUN cd /tmp && \
wget "https://repo.continuum.io/archive/Anaconda3-4.3.1-Linux-x86_64.sh" -O "Anaconda.sh" && \
bash "Anaconda.sh" -b && \
echo "export PATH=\"$HOME/anaconda3/bin:\$PATH\"" >> ~/.bashrc && \
export PATH="$HOME/anaconda3/bin:$PATH" && \
conda install -y bcolz && \
conda upgrade -y --all && \
pip install Pillow && \
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.1-cp36-cp36m-linux_x86_64.whl && \
pip install keras
为什么我的Python脚本在顶层没有正确使用GPU?我可以看到它分配了大约3千兆字节的内存,但它没有用它进行处理。