Question

我们一直在研究一个问题，这个问题在我们观察到的由理论上没有影响的几个因素引起的结果之间产生了一些差异。

在我们的问题中，我们正在处理大小不同的数据集。在训练期间，通过填充或斩波将输入固定为特定大小。然而，出于性能和速度的考虑，我们希望利用这样一个事实，即我们应该能够预先计算输入的结果，该输入仅包括通过我们的卷积神经网络层的填充。因此，对于明显小于训练中设置的固定大小的输入，我们将确定最终转换层生成的激活所需的最小填充量，如果将其填充到大的固定大小，则与相同输入的激活相匹配，除了需要附加到结果中的预先计算的所有pad激活之外。

我们正在试验的模型的conv网部分由9层组成，前8个执行conv2d后跟maxpool，最后一个是单个conv2d。模型的训练是在gpu上进行的，然而，为了测试我们只有cpu可用。使用可变填充方法对大量数据进行测试，大部分产生了与固定大小输入相比我们期望看到的相同结果。但是，在一些情况下，我们观察到使用这两种方法获得的值之间存在差异，有时大约为e-4。

这个问题看起来仅仅出现在规模较小一端的输入中。深入研究这些案例表明，模型中每一层的激活总是相同的，直到你到达出现差异的最终卷积层为止。令人惊讶的是，我们发现添加足够的填充（仍然远低于固定大小）可能会影响最后一层激活的前几次，即使这些单元根本不依赖于额外的填充。

我们能够使用预制数据复制问题，甚至可以在单个conv2d操作中进行演示。在下面的代码中，一个单一长度的简单conv2d：5步幅：1个过滤器应用于长度为7的张量，产生3次激活。但是，当您在输入上附加1或2个额外值时，即使它们根本不依赖于这些新输入值，第2次和第3次激活的值也会更改。使用tf.mul和reduce_sum ops手动复制结果，产生与带有额外值的输入相同的结果，表明它是正确的结果。这仅在ops在cpu上执行时才会被观察到，但是，无论输入的长度是7,8还是9，它们都会将它们分配给gpu而不会产生相同的值。这是在cpu上执行conv时的预期？有没有办法解释转换输出中的方差，例如。是否因为用于小输入的算法而发生？以上所有内容均指使用VALID填充的conv2d。

示例：

# Consider a length-T tensor x0 where T>=5.  Create another tensor
# x1 by adding an element onto the end of x0.  If you convolve both
# x0 and x1 with a length-5 filter with VALID padding, you'd expect
# the first (T-4) elements of the resulting tensors to have the
# exact same value because the calculation is being done on the
# same set of numbers.  It turns out you can get discrepancies
# though on the order of e^-5 if the convolution is done on the CPU.

#device = '/gpu:0'
device = '/cpu:0'

T = 7  # >=5

def expand_4d(f, n):
    f = tf.expand_dims(f, n[0])
    f = tf.expand_dims(f, n[1])
    f = tf.expand_dims(f, n[2])
    return f

# Convolution Filter and Bias
cnv = tf.cast([-0.7313, -1.1043, 1.8492, 1.3007, -0.1033], tf.float32)
cnv = expand_4d(cnv, [0, -1, -1])
bias = tf.constant([0.0401], tf.float32)

# Input
x0 = 10.0 * tf.cast(tf.range(T), tf.float32)
x1 = 10.0 * tf.cast(tf.range(T+1), tf.float32)
x2 = 10.0 * tf.cast(tf.range(T+2), tf.float32)
x0 = expand_4d(x0, [0, 0, -1])
x1 = expand_4d(x1, [0, 0, -1])
x2 = expand_4d(x2, [0, 0, -1])

# Run Convolution
def my_conv(x):
    with tf.device(device):
        return tf.nn.bias_add(tf.nn.conv2d(x, cnv, strides=[1,1,1,1], padding='VALID'), bias)

y0 = my_conv(x0)
y1 = my_conv(x1)
y2 = my_conv(x2)
n = T - 4   # length of y0

sess = tf.Session()
y0_, y1_, y2_ = sess.run([y0, y1, y2])
print('T =', T)
print('device =', device)
print('y0 = convolution with length-T tensor x0')
print(y0_)
print('y1 = convolution with length-(T+1) tensor x1')
print(y1_)
print('y2 = convolution with length-(T+2) tensor x2')
print(y2_)
# Compare the first n elements of each tensor (should all be equal)
print('y0 - y1[0:%s]' % n)
print(y0_[0][0] - y1_[0][0][0:n])
print('y1[0:%s] - y2[0:%s]' % (n, n))
print(y1_[0][0][0:n] - y2_[0][0][0:n])

示例输出：

T = 7
device = /cpu:0
y0 = convolution with length-T tensor x0
[[[[ 60.87010574]
   [ 72.98010254]
   [ 85.09010315]]]]
y1 = convolution with length-(T+1) tensor x1
[[[[ 60.87010574]
   [ 72.98009491]
   [ 85.09008789]
   [ 97.20009613]]]]
y2 = convolution with length-(T+2) tensor x2
[[[[  60.87010574]
   [  72.98009491]
   [  85.09008789]
   [  97.20009613]
   [ 109.31010437]]]]
y0 - y1[0:3]
[[  0.00000000e+00]
 [  7.62939453e-06]
 [  1.52587891e-05]]
y1[0:3] - y2[0:3]
[[ 0.]
 [ 0.]
 [ 0.]]

Tensorflow conv2d结果与cpu vs gpu

0 个答案: