机器学习计算中GPU使用率低

时间:2018-04-07 05:59:29

标签: tensorflow keras caffe

我的朋友和我挑战了一场名为“植物幼苗分类”的讨价还价比赛。他建立了一个keras内核如下:

 import numpy as np
 import pandas as pd
 import datetime as dt
 import matplotlib.pyplot as plt
 from os import listdir, makedirs
 from os.path import join, exists, expanduser
 from keras import models
 from keras.preprocessing import image
 from keras.applications import xception
 from keras.preprocessing.image import ImageDataGenerator
 from keras.callbacks import TensorBoard
 from keras.callbacks import ModelCheckpoint
 from keras import optimizers
 from keras import layers
 from keras.applications.xception import Xception

 train_dir = './Plant_Seedlings_Classification/train/'
 valid_dir = './Plant_Seedlings_Classification/validation/'

 im_size = 299
 batch_size = 10
 train_num = 3803
 valid_num = 947

 conv_base = Xception(weights='imagenet',\
         include_top=False,
         input_shape=(im_size, im_size, 3))

 conv_base.trainable = False

 model = models.Sequential()
 model.add(conv_base)
 model.add(layers.Flatten()) # Flatten
 model.add(layers.Dense(128, activation='relu'))
 model.add(layers.Dropout(0.5))
 model.add(layers.Dense(12, activation='softmax'))

 model.summary()

 train_datagen = ImageDataGenerator(rescale=1./255,\
                   rotation_range=30,\
                   width_shift_range=0.2,\
                   height_shift_range=0.2,\
                   zoom_range=0.2,\
                   horizontal_flip=True,\
                   fill_mode='nearest')


 valid_datagen = ImageDataGenerator(rescale=1./255)

 print("train gen")
 train_generator = train_datagen.flow_from_directory(
         train_dir,
         class_mode='categorical',
         target_size=(im_size, im_size),
         color_mode='rgb',
         batch_size=batch_size,)



 print("validation gen")


 validation_generator = valid_datagen.flow_from_directory(
         valid_dir,
         class_mode='categorical',
         target_size=(im_size, im_size),
         color_mode='rgb',
         batch_size=batch_size,)


 print("train indices",train_generator.class_indices)

 print("validation indices", validation_generator.class_indices)

 model.compile(loss='categorical_crossentropy',
               optimizer=optimizers.Nadam(lr=1e-4,
                                           beta_1=0.9, \
                                           beta_2=0.999, \
                                           epsilon=1e-08, \
                                           schedule_decay=0.004),
                                           metrics=['acc'])


 steps_per_epoch = int(train_num/batch_size)+50
 validation_step = int(valid_num/batch_size)+1

 print("steps_per_epoch", steps_per_epoch)
 print("validation_step", validation_step)

 model_save_path = 'xception_299_0304_nomask_epoch{epoch:02d}_vacc{val_acc:.4f}.h5'


 history = model.fit_generator(
       train_generator,
       steps_per_epoch=steps_per_epoch,
       epochs=50,
       validation_data=validation_generator,
       validation_steps=validation_step,
       callbacks=[TensorBoard(log_dir='/tmp/tensorflow/log/1'), \
                   ModelCheckpoint(filepath=model_save_path, \
                                     monitor='val_acc',\
                                     save_best_only=True,\
                                     mode='max')])

代码将运行50个纪元。

当内核在我朋友的电脑上运行时(I7-7700,GTX 1060,8G DDR4-2400),大概花了90 seconds per epoch

当它在我的电脑上运行时(I5-7400,GTX 1070Ti,16G DDR4-2400)花了大约120 seconds per epoch

我们都使用tensorflow_gpu来运行这个内核。我的问题是,为什么我PC上的GPU计算速度比我朋友的慢得多?

我们使用nvidia-smi检查了GPU的使用情况。当内核运行时,我的GPU上的Volatile GPU-Util只有30%到60%,而我朋友的则只有95%。

当我运行其他ML框架(如Caffe或Tensorflow而不使用keras API)时,易失性GPU-Util可以在我的GPU上达到~100%。

任何指针?感谢。

0 个答案:

没有答案