Tensorflow数据并行性|多GPU利用率只有一个为非零

时间:2019-06-27 04:48:43

标签: tensorflow gpu

我已经完成了多GPU版本word2vec,并且我在代码中应用了log_device_placement,该代码显示了一些操作已应用于多GPU:

  

2019-06-27 00:32:34.536178:我   tensorflow / core / common_runtime / placer.cc:874]   optimizer_7 / gradients / loss_7 / sampled_losses / Log1p_grad / add / x:   (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:7   optimizer_6 / gradients / loss_6 / sampled_losses / Log1p_grad / add / x :(常量):   / job:本地主机/副本:0 /任务:0 /设备:GPU:6 2019-06-27   00:32:34.536188:我tensorflow / core / common_runtime / placer.cc:874]   optimizer_6 / gradients / loss_6 / sampled_losses / Log1p_grad / add / x:   (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:6   optimizer_5 / gradients / loss_5 / sampled_losses / Log1p_grad / add / x :(常量):   / job:本地主机/副本:0 /任务:0 /设备:GPU:5 2019-06-27   00:32:34.536202:我tensorflow / core / common_runtime / placer.cc:874]   optimizer_5 / gradients / loss_5 / sampled_losses / Log1p_grad / add / x:   (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:5   optimizer_4 / gradients / loss_4 / sampled_losses / Log1p_grad / add / x :(常量):   / job:本地主机/副本:0 /任务:0 /设备:GPU:4 2019-06-27   00:32:34.536216:我tensorflow / core / common_runtime / placer.cc:874]   optimizer_4 / gradients / loss_4 / sampled_losses / Log1p_grad / add / x:   (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:4   optimizer_3 / gradients / loss_3 / sampled_losses / Log1p_grad / add / x :(常量):   / job:本地主机/副本:0 /任务:0 /设备:GPU:3 2019-06-27   00:32:34.536231:我tensorflow / core / common_runtime / placer.cc:874]   optimizer_3 / gradients / loss_3 / sampled_losses / Log1p_grad / add / x:   (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:3   optimizer_2 / gradients / loss_2 / sampled_losses / Log1p_grad / add / x :(常量):   / job:本地主机/副本:0 /任务:0 /设备:GPU:2 2019-06-27   00:32:34.536246:我tensorflow / core / common_runtime / placer.cc:874]   Optimizer_2 / gradients / loss_2 / sampled_losses / Log1p_grad / add / x:   (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:2   optimizer_1 / gradients / loss_1 / sampled_losses / Log1p_grad / add / x :(常量):   / job:本地主机/副本:0 /任务:0 /设备:GPU:1 2019-06-27   00:32:34.536273:我tensorflow / core / common_runtime / placer.cc:874]   optimizer_1 / gradients / loss_1 / sampled_losses / Log1p_grad / add / x:   (常量)/作业:本地主机/副本:0 /任务:0 /设备:GPU:1   优化程序/渐变/损失/ sampled_loss / Log1p_grad /添加/ x :(常量):   / job:本地主机/副本:0 /任务:0 /设备:GPU:0 2019-06-27   00:32:34.536288:我tensorflow / core / common_runtime / placer.cc:874]   优化程序/渐变/损失/ sampled_loss / Log1p_grad /添加/ x:   (const)/ job:localhost / replica:0 / task:0 / device:GPU:0 ......

但是nvidia-smi当时只显示了一份GPU工作:

  

+ ---------------------------------------------- ------------------------------- + | NVIDIA-SMI 396.44驱动程序版本:396.44 |   | ------------------------------- + ----------------- ----- + ---------------------- + | GPU名称持久性-M |总线编号Disp.A |挥发性不佳。 ECC | |风扇   Temp Perf Pwr:用法/上限|内存使用| GPU实用计算M。   | ============================== + ================= ===== + ==================== | | 0 GeForce GTX 108 ...关闭| 00000000:04:00.0关闭| N / A | | 36%49C P2   80W / 250W | 10882MiB / 11178MiB | 26%默认|   + ------------------------------- + ----------------- ----- + ---------------------- + | 1 GeForce GTX 108 ...关闭| 00000000:06:00.0关闭| N / A | | 29%39C P2   56W / 250W | 10631MiB / 11178MiB | 0%默认|   + ------------------------------- + ----------------- ----- + ---------------------- + | 2 GeForce GTX 108 ...关闭| 00000000:07:00.0关闭| N / A | | 29%36C P2   54W / 250W | 10631MiB / 11178MiB | 0%默认|   + ------------------------------- + ----------------- ----- + ---------------------- + | 3 GeForce GTX 108 ...关闭| 00000000:08:00.0关闭| N / A | | 29%38C P2   55W / 250W | 10631MiB / 11178MiB | 0%默认|   + ------------------------------- + ----------------- ----- + ---------------------- + | 4 GeForce GTX 108 ...关闭| 00000000:0C:00.0灭| N / A | | 29%38C P2   55W / 250W | 10631MiB / 11178MiB | 0%默认|   + ------------------------------- + ----------------- ----- + ---------------------- + | 5 GeForce GTX 108 ...关闭| 00000000:0D:00.0关闭| N / A | | 29%33C P2   55W / 250W | 10631MiB / 11178MiB | 0%默认|   + ------------------------------- + ----------------- ----- + ---------------------- + | 6 GeForce GTX 108 ...关闭| 00000000:0E:00.0关闭| N / A | | 29%37C P2   55W / 250W | 10631MiB / 11178MiB | 0%默认|   + ------------------------------- + ----------------- ----- + ---------------------- + | 7 GeForce GTX 108 ...关闭| 00000000:0F:00.0灭| N / A | | 29%36C P2   54W / 250W | 10663MiB / 11178MiB | 6%违约|   + ------------------------------- + ----------------- ----- + ---------------------- +

     

+ ---------------------------------------------- ------------------------------- + |进程:GPU内存| | GPU PID类型进程名称用法|   | ================================================= ========================== | | 0 38130 C python 8987MiB | | 1 38130 C python 10621MiB | | 2 38130摄氏度   python 10621MiB | | 3 38130 C python 10621MiB | | 4 38130 C蟒蛇   10621MiB | | 5 38130 C蟒蛇10621MiB | | 6 38130 C蟒蛇10621MiB |   | 7 38130 C python 10653MiB |   + ------------------------------------------------- ---------------------------- +`

我在此处附上我的源代码:

...
with tf.name_scope('inputs'):
train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])

    upper = 4
    for i in range(0,upper):
        with tf.device(tf.DeviceSpec(device_type="GPU", device_index=i)):
            data_size = batch_size / upper
            data_size = int(data_size)
            print(data_size)
            _train_inputs = train_inputs[i * data_size : (i + 1) * data_size]
            _train_labels = train_labels[i * data_size : (i + 1) * data_size]


            with tf.name_scope('embeddings'):
                if prev_emb_model == '0': 
                    embeddings = tf.Variable(
                        tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
                else:
                    add_on_emb = tf.random_uniform([vocabulary_size - len(emb), embedding_size], -1.0, 1.0)
                    embeddings = tf.concat([emb, add_on_emb], 0)
                embed = tf.nn.embedding_lookup(embeddings, _train_inputs)

            # Construct the variables for the NCE loss
            with tf.name_scope('weights'):
                nce_weights = tf.Variable(
                    tf.truncated_normal([vocabulary_size, embedding_size],
                                        stddev=1.0 / math.sqrt(embedding_size)))
            with tf.name_scope('biases'):
                nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

                # Compute the average NCE loss for the batch.
            # tf.nce_loss automatically draws a new sample of the negative labels each
            # time we evaluate the loss.
            # Explanation of the meaning of NCE loss:
            #   http://mccormickml.com/2016/04/19/waord2vec-tutorial-the-skip-gram-model/

            # with tf.device(tf.DeviceSpec(device_type="GPU", device_index=0)):
            with tf.name_scope('loss'):
                loss = tf.reduce_mean(
                    tf.nn.nce_loss(
                        weights=nce_weights,
                        biases=nce_biases,
                        labels=_train_labels,
                        inputs=embed,
                        num_sampled=num_sampled,
                        num_classes=vocabulary_size))


            # Construct the SGD optimizer using a learning rate of 1.0.
            with tf.name_scope('optimizer'):
                optimizer = tf.train.GradientDescentOptimizer(
                    1.0).minimize(loss, colocate_gradients_with_ops=True)


    # Compute the cosine similarity between minibatch examples and all
    # embeddings.
    norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keepdims=True))
    normalized_embeddings = embeddings / norm

    # Add variable initializer.
    init = tf.global_variables_initializer()


config = tf.ConfigProto(allow_soft_placement=True,log_device_placement=True)
# config.gpu_options.allow_growth = True
with tf.Session(graph=graph, config=config) as session:

    #  We must initialize all variables before we use them.
    init.run()
    print('Initialized')
    average_loss = 0

    walks_data = []
    for w in walks:
        for n in w: 
            walks_data.append(n)

    for step in range(args.iter):
        print(step)

        batch_inputs, batch_labels = generate_batch(batch_size, 1,
                                                    window_size, walks_data)

        feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels}


        _, loss_val = session.run([optimizer, loss],
                                            feed_dict=feed_dict,
                                            run_metadata=run_metadata)
        average_loss += loss_val

        if step % 2000 == 0:
        ...

    final_embeddings = normalized_embeddings.eval()

0 个答案:

没有答案