嵌套的TF功能非常慢

时间:2019-06-11 16:04:55

标签: python tensorflow tensorflow2.0

在一个用tf.function装饰的函数中,我尝试调用另一个用tf.function装饰的函数。结果太慢了。

那是因为我不应该在函数中使用python本机类​​型吗? Tensorflow 2.0 model using tf.function very slow and is recompiling every time the train count changes. Eager runs about 4x faster

测试:

import numpy as np
import tensorflow as tf


@tf.function
def loop(x, y):
    for i in range(1000):
        x.assign_add(y)
    return x


@tf.function
def loop2(x, y):
    for i in range(1000):
        loop(x, y)
    return x


def main():
    print("TensorFlow version: {}".format(tf.__version__))
    print("Eager execution: {}".format(tf.executing_eagerly()))

    x = tf.Variable(initial_value=0, dtype=np.float32)
    y = tf.Variable(initial_value=1, dtype=np.float32)

    # print(loop2(x, y))  # horribly slow

    for i in range(1000):  # faster
        loop(x, y)


main()

2 个答案:

答案 0 :(得分:3)

您应该阅读part 3中所链接的答案中引用的文章。

在第3部分中,您不仅可以看到问题,不仅是在使用Python本机类​​型时,而且在使用适用于Python类型而不是for对象的Python构造(例如tf.Tensor)时

尤其是,当在range而不是tf.range上循环时,由于要重复1000次body循环(展开),因此您正在构建一个巨大的图形循环。

如果将range替换为tf.range,一切将会更快。

证明。

您的代码(时间测量和100而不是1000):

import numpy as np
import tensorflow as tf
from time import time

@tf.function
def loop(x, y):
    for i in range(100):
        x.assign_add(y)
    return x


@tf.function
def loop2(x, y):
    for i in range(100):
        loop(x, y)
    return x


def main():
    print("TensorFlow version: {}".format(tf.__version__))
    print("Eager execution: {}".format(tf.executing_eagerly()))

    x = tf.Variable(initial_value=0, dtype=np.float32)
    y = tf.Variable(initial_value=1, dtype=np.float32)
    print("one")
    start = time()
    print(loop2(x, y))  # horribly slow
    print("end: ", time() - start)
    print("second: ")
    start = time()
    for i in range(100):  # faster
        loop(x, y)
    print("end: ", time() - start)


main()

输出:

TensorFlow version: 2.0.0-beta0
Eager execution: True
one
tf.Tensor(10000.0, shape=(), dtype=float32)
end:  86.44128751754761
second: 
end:  0.08476066589355469

仅使用TensorFlow方法更新代码:

@tf.function
def loop__(x, y):
    for i in tf.range(100):
        x.assign_add(y)
    return x


@tf.function
def loop2__(x, y):
    for i in tf.range(100):
        loop__(x, y)
    return x


def main():
    print("TensorFlow version: {}".format(tf.__version__))
    print("Eager execution: {}".format(tf.executing_eagerly()))

    x = tf.Variable(initial_value=0, dtype=np.float32)
    y = tf.Variable(initial_value=1, dtype=np.float32)
    print("one")
    start = time()
    print(loop2__(x, y))  # horribly slow
    print("end: ", time() - start)
    print("second: ")
    start = time()
    for i in tf.range(100):  # faster
        loop__(x, y)
    print("end: ", time() - start)


main()

输出:

TensorFlow version: 2.0.0-beta0
Eager execution: True
one
tf.Tensor(10000.0, shape=(), dtype=float32)
end:  0.4946322441101074
second: 
end:  0.24096465110778809

答案 1 :(得分:3)

关于@ tf.function(laymanesque)的几点注意事项:

  1. @ tf.function构建其装饰的函数的可调用图形
  2. 通过使用作为函数签名的键来引用该图。如果函数的输入为张量,则此签名为TensorSpec;如果函数的输入不是张量,则此签名为具有参数实际值的元组
  3. 每次调用该图时,都会在所有可用的“可调用图”中检查该键,如果找到匹配项,则使用“已构建的可调用图”。如果不是,该函数将转换为可调用图形,然后将其调用。建立图表称为通过文档跟踪功能。现在,您将了解为什么每次使用python natives调用函数时都会创建一个新图。输入的特定组合根本不会作为键出现,而在张量的情况下,Tesnsorspec键对于具有相同形状和dtype的每个张量都是相同的
  4. 如果在函数内部使用python迭代器,则在“跟踪函数”时,将展开循环以创建巨型图。如果使用了类似tf.range的张量流,则张量流将知道如何处理而不会展开。首次运行该功能时会产生开销,但展开的循环始终比循环本身快。因此,您将注意到的行为是:与tensorflow equivalnet(tf.range)相比,使用python可迭代,第一个函数运行非常慢,因此创建的图形将消耗加速器上的更多内存,但明显更快在所有后续运行中,因为带有python iterable的图形使用展开循环

演示:

使用tf.range

@tf.function
def loop__(x, y):
    for i in tf.range(10000):
        x.assign_add(y)
    return x


@tf.function
def loop2__(x, y):
    for i in tf.range(100):
        loop__(x, y)
    return x

x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)

start = time()
print(loop2__(x, y))
print("first run with tf.range", time() - start)
start = time()
print(loop2__(x, y))
print("second run with tf.range", time() - start)

output:
tf.Tensor(1000000.0, shape=(), dtype=float32)
first run with python range 10.322974920272827
tf.Tensor(2000000.0, shape=(), dtype=float32)
second run with python range 11.379822969436646

具有python范围:

@tf.function
def loop__(x, y):
    for i in range(10000):
        x.assign_add(y)
    return x


@tf.function
def loop2__(x, y):
    for i in tf.range(100):
        loop__(x, y)
    return x

x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)

start = time()
print(loop2__(x, y))
print("first run with python range", time() - start)
start = time()
print(loop2__(x, y))
print("second run with python range", time() - start)

output (with loads of warnings about inefficient graph unrolling):
tf.Tensor(1000000.0, shape=(), dtype=float32)
first run with python range 51.13001751899719
tf.Tensor(2000000.0, shape=(), dtype=float32)
second run with python range 1.1093688011169434