在一个用tf.function
装饰的函数中,我尝试调用另一个用tf.function
装饰的函数。结果太慢了。
那是因为我不应该在函数中使用python本机类型吗? Tensorflow 2.0 model using tf.function very slow and is recompiling every time the train count changes. Eager runs about 4x faster
测试:
import numpy as np
import tensorflow as tf
@tf.function
def loop(x, y):
for i in range(1000):
x.assign_add(y)
return x
@tf.function
def loop2(x, y):
for i in range(1000):
loop(x, y)
return x
def main():
print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution: {}".format(tf.executing_eagerly()))
x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)
# print(loop2(x, y)) # horribly slow
for i in range(1000): # faster
loop(x, y)
main()
答案 0 :(得分:3)
您应该阅读part 3中所链接的答案中引用的文章。
在第3部分中,您不仅可以看到问题,不仅是在使用Python本机类型时,而且在使用适用于Python类型而不是for
对象的Python构造(例如tf.Tensor
)时
尤其是,当在range
而不是tf.range
上循环时,由于要重复1000
次body循环(展开),因此您正在构建一个巨大的图形循环。
如果将range
替换为tf.range
,一切将会更快。
证明。
您的代码(时间测量和100而不是1000):
import numpy as np
import tensorflow as tf
from time import time
@tf.function
def loop(x, y):
for i in range(100):
x.assign_add(y)
return x
@tf.function
def loop2(x, y):
for i in range(100):
loop(x, y)
return x
def main():
print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution: {}".format(tf.executing_eagerly()))
x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)
print("one")
start = time()
print(loop2(x, y)) # horribly slow
print("end: ", time() - start)
print("second: ")
start = time()
for i in range(100): # faster
loop(x, y)
print("end: ", time() - start)
main()
输出:
TensorFlow version: 2.0.0-beta0
Eager execution: True
one
tf.Tensor(10000.0, shape=(), dtype=float32)
end: 86.44128751754761
second:
end: 0.08476066589355469
仅使用TensorFlow方法更新代码:
@tf.function
def loop__(x, y):
for i in tf.range(100):
x.assign_add(y)
return x
@tf.function
def loop2__(x, y):
for i in tf.range(100):
loop__(x, y)
return x
def main():
print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution: {}".format(tf.executing_eagerly()))
x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)
print("one")
start = time()
print(loop2__(x, y)) # horribly slow
print("end: ", time() - start)
print("second: ")
start = time()
for i in tf.range(100): # faster
loop__(x, y)
print("end: ", time() - start)
main()
输出:
TensorFlow version: 2.0.0-beta0
Eager execution: True
one
tf.Tensor(10000.0, shape=(), dtype=float32)
end: 0.4946322441101074
second:
end: 0.24096465110778809
答案 1 :(得分:3)
关于@ tf.function(laymanesque)的几点注意事项:
TensorSpec
;如果函数的输入不是张量,则此签名为具有参数实际值的元组Tesnsorspec
键对于具有相同形状和dtype的每个张量都是相同的tf.range
的张量流,则张量流将知道如何处理而不会展开。首次运行该功能时会产生开销,但展开的循环始终比循环本身快。因此,您将注意到的行为是:与tensorflow equivalnet(tf.range)相比,使用python可迭代,第一个函数运行非常慢,因此创建的图形将消耗加速器上的更多内存,但明显更快在所有后续运行中,因为带有python iterable的图形使用展开循环。 演示:
使用tf.range
@tf.function
def loop__(x, y):
for i in tf.range(10000):
x.assign_add(y)
return x
@tf.function
def loop2__(x, y):
for i in tf.range(100):
loop__(x, y)
return x
x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)
start = time()
print(loop2__(x, y))
print("first run with tf.range", time() - start)
start = time()
print(loop2__(x, y))
print("second run with tf.range", time() - start)
output:
tf.Tensor(1000000.0, shape=(), dtype=float32)
first run with python range 10.322974920272827
tf.Tensor(2000000.0, shape=(), dtype=float32)
second run with python range 11.379822969436646
具有python范围:
@tf.function
def loop__(x, y):
for i in range(10000):
x.assign_add(y)
return x
@tf.function
def loop2__(x, y):
for i in tf.range(100):
loop__(x, y)
return x
x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)
start = time()
print(loop2__(x, y))
print("first run with python range", time() - start)
start = time()
print(loop2__(x, y))
print("second run with python range", time() - start)
output (with loads of warnings about inefficient graph unrolling):
tf.Tensor(1000000.0, shape=(), dtype=float32)
first run with python range 51.13001751899719
tf.Tensor(2000000.0, shape=(), dtype=float32)
second run with python range 1.1093688011169434