Question

我第一次尝试GPU计算，当然希望大幅提升速度。然而，在张量流中有一个基本的例子，实际上情况更糟：

在cpu：0上，十次运行中的每次运行平均需要2秒，gpu：0需要2.7秒，而gpu：1比cpu更差50％：0比3秒。

以下是代码：

import tensorflow as tf
import numpy as np
import time
import random

for _ in range(10):
    with tf.Session() as sess:
        start = time.time()
        with tf.device('/gpu:0'): # swap for 'cpu:0' or whatever
            a = tf.constant([random.random() for _ in xrange(1000 *1000)], shape=[1000, 1000], name='a')
            b = tf.constant([random.random() for _ in xrange(1000 *1000)], shape=[1000, 1000], name='b')
            c = tf.matmul(a, b)
            d = tf.matmul(a, c)
            e = tf.matmul(a, d)
            f = tf.matmul(a, e)
            for _ in range(1000):
                sess.run(f)
        end = time.time()
        print(end - start)

我在这里观察什么？运行时间可能主要是在RAM和GPU之间复制数据吗？

Answer 1

用于生成数据的方式在CPU上执行（random.random()是常规python函数而不是TF-one）。此外，执行10^6次将比在一次运行中请求10^6个随机数慢。将代码更改为：

a = tf.random_uniform([1000, 1000], name='a')
b = tf.random_uniform([1000, 1000], name='b')

这样数据将在GPU上并行生成，不会浪费时间将其从RAM传输到GPU。

GPU上的Tensorflow matmul计算速度比CPU慢

1 个答案: