我已经完成了多GPU版本word2vec,并且我在代码中应用了log_device_placement
,该代码显示了一些操作已应用于多GPU:
2019-06-27 00:32:34.536178:我 tensorflow / core / common_runtime / placer.cc:874] optimizer_7 / gradients / loss_7 / sampled_losses / Log1p_grad / add / x: (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:7 optimizer_6 / gradients / loss_6 / sampled_losses / Log1p_grad / add / x :(常量): / job:本地主机/副本:0 /任务:0 /设备:GPU:6 2019-06-27 00:32:34.536188:我tensorflow / core / common_runtime / placer.cc:874] optimizer_6 / gradients / loss_6 / sampled_losses / Log1p_grad / add / x: (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:6 optimizer_5 / gradients / loss_5 / sampled_losses / Log1p_grad / add / x :(常量): / job:本地主机/副本:0 /任务:0 /设备:GPU:5 2019-06-27 00:32:34.536202:我tensorflow / core / common_runtime / placer.cc:874] optimizer_5 / gradients / loss_5 / sampled_losses / Log1p_grad / add / x: (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:5 optimizer_4 / gradients / loss_4 / sampled_losses / Log1p_grad / add / x :(常量): / job:本地主机/副本:0 /任务:0 /设备:GPU:4 2019-06-27 00:32:34.536216:我tensorflow / core / common_runtime / placer.cc:874] optimizer_4 / gradients / loss_4 / sampled_losses / Log1p_grad / add / x: (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:4 optimizer_3 / gradients / loss_3 / sampled_losses / Log1p_grad / add / x :(常量): / job:本地主机/副本:0 /任务:0 /设备:GPU:3 2019-06-27 00:32:34.536231:我tensorflow / core / common_runtime / placer.cc:874] optimizer_3 / gradients / loss_3 / sampled_losses / Log1p_grad / add / x: (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:3 optimizer_2 / gradients / loss_2 / sampled_losses / Log1p_grad / add / x :(常量): / job:本地主机/副本:0 /任务:0 /设备:GPU:2 2019-06-27 00:32:34.536246:我tensorflow / core / common_runtime / placer.cc:874] Optimizer_2 / gradients / loss_2 / sampled_losses / Log1p_grad / add / x: (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:2 optimizer_1 / gradients / loss_1 / sampled_losses / Log1p_grad / add / x :(常量): / job:本地主机/副本:0 /任务:0 /设备:GPU:1 2019-06-27 00:32:34.536273:我tensorflow / core / common_runtime / placer.cc:874] optimizer_1 / gradients / loss_1 / sampled_losses / Log1p_grad / add / x: (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:1 优化程序/渐变/损失/ sampled_loss / Log1p_grad /添加/ x :(常量): / job:本地主机/副本:0 /任务:0 /设备:GPU:0 2019-06-27 00:32:34.536288:我tensorflow / core / common_runtime / placer.cc:874] 优化程序/渐变/损失/ sampled_loss / Log1p_grad /添加/ x: (const)/ job:localhost / replica:0 / task:0 / device:GPU:0 ......
但是nvidia-smi
当时只显示了一份GPU工作:
+ ---------------------------------------------- ------------------------------- + | NVIDIA-SMI 396.44驱动程序版本:396.44 | | ------------------------------- + ----------------- ----- + ---------------------- + | GPU名称持久性-M |总线编号Disp.A |挥发性不佳。 ECC | |风扇 Temp Perf Pwr:用法/上限|内存使用| GPU实用计算M。 | ============================== + ================= ===== + ==================== | | 0 GeForce GTX 108 ...关闭| 00000000:04:00.0关闭| N / A | | 36%49C P2 80W / 250W | 10882MiB / 11178MiB | 26%默认| + ------------------------------- + ----------------- ----- + ---------------------- + | 1 GeForce GTX 108 ...关闭| 00000000:06:00.0关闭| N / A | | 29%39C P2 56W / 250W | 10631MiB / 11178MiB | 0%默认| + ------------------------------- + ----------------- ----- + ---------------------- + | 2 GeForce GTX 108 ...关闭| 00000000:07:00.0关闭| N / A | | 29%36C P2 54W / 250W | 10631MiB / 11178MiB | 0%默认| + ------------------------------- + ----------------- ----- + ---------------------- + | 3 GeForce GTX 108 ...关闭| 00000000:08:00.0关闭| N / A | | 29%38C P2 55W / 250W | 10631MiB / 11178MiB | 0%默认| + ------------------------------- + ----------------- ----- + ---------------------- + | 4 GeForce GTX 108 ...关闭| 00000000:0C:00.0灭| N / A | | 29%38C P2 55W / 250W | 10631MiB / 11178MiB | 0%默认| + ------------------------------- + ----------------- ----- + ---------------------- + | 5 GeForce GTX 108 ...关闭| 00000000:0D:00.0关闭| N / A | | 29%33C P2 55W / 250W | 10631MiB / 11178MiB | 0%默认| + ------------------------------- + ----------------- ----- + ---------------------- + | 6 GeForce GTX 108 ...关闭| 00000000:0E:00.0关闭| N / A | | 29%37C P2 55W / 250W | 10631MiB / 11178MiB | 0%默认| + ------------------------------- + ----------------- ----- + ---------------------- + | 7 GeForce GTX 108 ...关闭| 00000000:0F:00.0灭| N / A | | 29%36C P2 54W / 250W | 10663MiB / 11178MiB | 6%违约| + ------------------------------- + ----------------- ----- + ---------------------- +
+ ---------------------------------------------- ------------------------------- + |进程:GPU内存| | GPU PID类型进程名称用法| | ================================================= ========================== | | 0 38130 C python 8987MiB | | 1 38130 C python 10621MiB | | 2 38130摄氏度 python 10621MiB | | 3 38130 C python 10621MiB | | 4 38130 C蟒蛇 10621MiB | | 5 38130 C蟒蛇10621MiB | | 6 38130 C蟒蛇10621MiB | | 7 38130 C python 10653MiB | + ------------------------------------------------- ---------------------------- +`
我在此处附上我的源代码:
...
with tf.name_scope('inputs'):
train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
upper = 4
for i in range(0,upper):
with tf.device(tf.DeviceSpec(device_type="GPU", device_index=i)):
data_size = batch_size / upper
data_size = int(data_size)
print(data_size)
_train_inputs = train_inputs[i * data_size : (i + 1) * data_size]
_train_labels = train_labels[i * data_size : (i + 1) * data_size]
with tf.name_scope('embeddings'):
if prev_emb_model == '0':
embeddings = tf.Variable(
tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
else:
add_on_emb = tf.random_uniform([vocabulary_size - len(emb), embedding_size], -1.0, 1.0)
embeddings = tf.concat([emb, add_on_emb], 0)
embed = tf.nn.embedding_lookup(embeddings, _train_inputs)
# Construct the variables for the NCE loss
with tf.name_scope('weights'):
nce_weights = tf.Variable(
tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
with tf.name_scope('biases'):
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))
# Compute the average NCE loss for the batch.
# tf.nce_loss automatically draws a new sample of the negative labels each
# time we evaluate the loss.
# Explanation of the meaning of NCE loss:
# http://mccormickml.com/2016/04/19/waord2vec-tutorial-the-skip-gram-model/
# with tf.device(tf.DeviceSpec(device_type="GPU", device_index=0)):
with tf.name_scope('loss'):
loss = tf.reduce_mean(
tf.nn.nce_loss(
weights=nce_weights,
biases=nce_biases,
labels=_train_labels,
inputs=embed,
num_sampled=num_sampled,
num_classes=vocabulary_size))
# Construct the SGD optimizer using a learning rate of 1.0.
with tf.name_scope('optimizer'):
optimizer = tf.train.GradientDescentOptimizer(
1.0).minimize(loss, colocate_gradients_with_ops=True)
# Compute the cosine similarity between minibatch examples and all
# embeddings.
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keepdims=True))
normalized_embeddings = embeddings / norm
# Add variable initializer.
init = tf.global_variables_initializer()
config = tf.ConfigProto(allow_soft_placement=True,log_device_placement=True)
# config.gpu_options.allow_growth = True
with tf.Session(graph=graph, config=config) as session:
# We must initialize all variables before we use them.
init.run()
print('Initialized')
average_loss = 0
walks_data = []
for w in walks:
for n in w:
walks_data.append(n)
for step in range(args.iter):
print(step)
batch_inputs, batch_labels = generate_batch(batch_size, 1,
window_size, walks_data)
feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels}
_, loss_val = session.run([optimizer, loss],
feed_dict=feed_dict,
run_metadata=run_metadata)
average_loss += loss_val
if step % 2000 == 0:
...
final_embeddings = normalized_embeddings.eval()