我正在尝试使用数据并行机制来训练CNN模型。为了简单起见,我发布了一个train方法的代码片段,
def train_model(file_loc='', data_details='', epochs='',checkpoint='', learning_rate=1e-5):
train_initializer = data_details['train_it_init']
test_initializer = data_details['test_it_init']
iterator = data_details['iterator']
batch_list = [iterator.get_next() for _ in range(NUM_GPUS)]
with tf.device('/cpu:0'):
num_class=14
tower_grads = []
tower_losses = []
opt = tf.train.AdamOptimizer(learning_rate)
dropout_rate = tf.placeholder(tf.float32)
for i in range(NUM_GPUS):
with tf.variable_scope(tf.get_variable_scope()):
with tf.device('/gpu:{}'.format(i)), tf.name_scope('tower_{}'.format(i)) as scope:
xbatch, ybatch = batch_list[i]
total_loss = tower_loss(scope, xbatch, ybatch, dropout_rate=dropout_rate)
tower_grads.append(opt.compute_gradients(total_loss))
avg_grads = average_gradients(tower_grads)
train_op = opt.apply_gradients(avg_grads)
我正在尝试使用4 gpu(NUM_GPUS),但问题是一次仅激活1 gpu。似乎每个GPU都在按顺序进行工作。我不知道为什么它会像这样。我正在使用数据集api,并且迭代器和迭代器初始值设定项在data_details字典中(经过测试,正在工作),并且在特定变量和名称范围下的tower_loss(..)方法调用内调用了模型推断。任何建议都会很有帮助。
我的GPU使用情况如下所示,
gpu usage