我的代码优先:
from sklearn.datasets.samples_generator import make_blobs
from matplotlib import pyplot
from numpy import where
from keras.utils import to_categorical
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.utils import multi_gpu_model
X, y = make_blobs(n_samples=1000000, centers=3, n_features=3, cluster_std=2, random_state=2)
y = to_categorical(y)
n_train = 500000
trainX, testX = X[:n_train, :], X[n_train:, :]
trainY, testY = y[:n_train], y[n_train:]
model = Sequential()
model.add(Dense(50, input_dim=3, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(3, activation='softmax'))
p_model = multi_gpu_model(model, gpus=4)
opt = SGD(lr=0.01, momentum=0.9)
p_model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
history = p_model.fit(trainX, trainY, validation_data=(testX, testY), epochs=20, verbose=1, batch_size=32)
_, train_acc = p_model.evaluate(trainX, trainY, verbose=0)
_, test_acc = p_model.evaluate(testX, testY, verbose=0)
print("Train: %.3f, Test: %.3f" % (train_acc, test_acc))
使用4个功能强大的GPU的方法如下:
p_model = multi_gpu_model(model, gpus=4)
培训进行中,我可以看到以下内容(GPU已被充分利用?):
(tf_gpu) [martin@A08-R32-I196-3-FZ2LTP2 mlm]$ nvidia-smi
Wed Jan 23 09:08:24 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:02:00.0 Off | 0 |
| N/A 29C P0 49W / 250W | 21817MiB / 22919MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P40 Off | 00000000:04:00.0 Off | 0 |
| N/A 34C P0 50W / 250W | 21817MiB / 22919MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P40 Off | 00000000:83:00.0 Off | 0 |
| N/A 28C P0 48W / 250W | 21817MiB / 22919MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P40 Off | 00000000:84:00.0 Off | 0 |
| N/A 36C P0 51W / 250W | 21817MiB / 22919MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 114918 C python 21807MiB |
| 1 114918 C python 21807MiB |
| 2 114918 C python 21807MiB |
| 3 114918 C python 21807MiB |
+-----------------------------------------------------------------------------+
但是,与单GPU运行相比,甚至与我的Mac桌面运行相比,它根本无法提高速度。整个培训大约需要20分钟,几乎与单GPU培训所用的时间相同,并且比我个人Mac上的培训要慢得多。为什么会这样?
答案 0 :(得分:0)
将batch_size
增加32以上。继续增加,直到完全利用GPU。是的,这可能会影响您的模型,但确实会显着提高性能。您将不得不为该batch_size找到一个最佳选择。