我的朋友和我挑战了一场名为“植物幼苗分类”的讨价还价比赛。他建立了一个keras内核如下:
import numpy as np
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
from os import listdir, makedirs
from os.path import join, exists, expanduser
from keras import models
from keras.preprocessing import image
from keras.applications import xception
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import TensorBoard
from keras.callbacks import ModelCheckpoint
from keras import optimizers
from keras import layers
from keras.applications.xception import Xception
train_dir = './Plant_Seedlings_Classification/train/'
valid_dir = './Plant_Seedlings_Classification/validation/'
im_size = 299
batch_size = 10
train_num = 3803
valid_num = 947
conv_base = Xception(weights='imagenet',\
include_top=False,
input_shape=(im_size, im_size, 3))
conv_base.trainable = False
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten()) # Flatten
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(12, activation='softmax'))
model.summary()
train_datagen = ImageDataGenerator(rescale=1./255,\
rotation_range=30,\
width_shift_range=0.2,\
height_shift_range=0.2,\
zoom_range=0.2,\
horizontal_flip=True,\
fill_mode='nearest')
valid_datagen = ImageDataGenerator(rescale=1./255)
print("train gen")
train_generator = train_datagen.flow_from_directory(
train_dir,
class_mode='categorical',
target_size=(im_size, im_size),
color_mode='rgb',
batch_size=batch_size,)
print("validation gen")
validation_generator = valid_datagen.flow_from_directory(
valid_dir,
class_mode='categorical',
target_size=(im_size, im_size),
color_mode='rgb',
batch_size=batch_size,)
print("train indices",train_generator.class_indices)
print("validation indices", validation_generator.class_indices)
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Nadam(lr=1e-4,
beta_1=0.9, \
beta_2=0.999, \
epsilon=1e-08, \
schedule_decay=0.004),
metrics=['acc'])
steps_per_epoch = int(train_num/batch_size)+50
validation_step = int(valid_num/batch_size)+1
print("steps_per_epoch", steps_per_epoch)
print("validation_step", validation_step)
model_save_path = 'xception_299_0304_nomask_epoch{epoch:02d}_vacc{val_acc:.4f}.h5'
history = model.fit_generator(
train_generator,
steps_per_epoch=steps_per_epoch,
epochs=50,
validation_data=validation_generator,
validation_steps=validation_step,
callbacks=[TensorBoard(log_dir='/tmp/tensorflow/log/1'), \
ModelCheckpoint(filepath=model_save_path, \
monitor='val_acc',\
save_best_only=True,\
mode='max')])
代码将运行50个纪元。
当内核在我朋友的电脑上运行时(I7-7700,GTX 1060,8G DDR4-2400),大概花了90 seconds per epoch
。
当它在我的电脑上运行时(I5-7400,GTX 1070Ti,16G DDR4-2400)花了大约120 seconds per epoch
我们都使用tensorflow_gpu来运行这个内核。我的问题是,为什么我PC上的GPU计算速度比我朋友的慢得多?
我们使用nvidia-smi检查了GPU的使用情况。当内核运行时,我的GPU上的Volatile GPU-Util只有30%到60%,而我朋友的则只有95%。
当我运行其他ML框架(如Caffe或Tensorflow而不使用keras API)时,易失性GPU-Util可以在我的GPU上达到~100%。
任何指针?感谢。