我使用tensorflow的Python API编写强化学习代理。我需要在执行期间调用tensorflow数百万次来评估从外部tensorflow馈送的张量(没有训练),更不用说我需要发送一个大的训练集。评估调用每个花费超过1毫秒(GPU GTX 970,i7-4790K CPU),并且它们共同占用了我代码运行时间的一半以上。我想知道我是否能够以某种方式减少在没有训练的情况下评估数据所需的时间,或者减少调用tensorflow的开销。
可能会减慢整个评估过程的是我使用两个输入,我将卷积层应用于一个,然后在应用一些密集层之前使用tf.concat连接到另一个。这会使评估过程特别慢吗?
一些示例代码:
x1 = tf.placeholder(tf.float32, shape=[None, self.conv_size_0*self.conv_size_1], name="x1")
x2 = tf.placeholder(tf.float32, shape=[None, self.noconv_size], name="x2")
y_ = tf.placeholder(tf.float32, shape=[None, 1])
# Convolution in 1D image of length self.conv_size_0 and number of channels self.conv_size_1 (defined during graph creation, and then fixed)
x_image = tf.reshape(x1,[-1,self.conv_size_0,1,self.conv_size_1])
W_conv1 = weight_variable([3, 1, self.conv_size_1, self.conv_depth])
b_conv1 = bias_variable([self.conv_depth])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# Concatenating
x1convsize = self.conv_depth*self.conv_size_0
x1conv = tf.reshape(h_conv1,[-1,x1convsize])
sa_size = x1convsize + self.noconv_size
x = tf.concat([x1conv,x2],1)
# Here I ommited code defining some dense layers acting on x
y_f = ...
predDiff = tf.subtract(y_f,y_)
loss = tf.nn.l2_loss(predDiff)
train_step = tf.train.AdamOptimizer(5e-4).minimize(loss)
然后,为了评估我的功能,我打电话数百万次:
[Q] = sess.run([self.y_f], feed_dict={
self.x1: x1, self.x2: x2, self.keep_prob: 1.0})