Question

经典CNN的缺点是我们需要将过滤器数量增加很多，以便每个过滤器将由不同的对象激活（例如，伞将激活第一个卷积的第67个过滤器和第一个卷积的第x个过滤器kth卷积，...）此外，CNN并未考虑像素之间的亲和力。为了使用神经网络提高分割/ alpha抠图的质量，我们需要提出新的架构。我最近遇到了这篇论文：https://arxiv.org/pdf/1904.05373.pdf，我发现这个想法非常有趣。我想基于相同的想法创建一个不同的层，但为了简单起见，假设我要按照上述论文中的描述实施PAC。

我有一些想法可以使用TensorFlow Python API来实现，但是我认为这些都不够快。我们的想法是，我们需要以每个卷积运算为中心的对称（内核）矩阵的权重来加权CNN的每个传统滤波器的权重（请参见本文的图1），但是仅tf.nn.conv2d函数采用一组大小为[filter_height, filter_width, filter_input, filter_output]的过滤器，因此卷积不会随我们要与之卷积的窗口的位置而变化。

我的想法是使用tensorflow定义类似于tf.nn.conv2d函数的函数，然后，如果代码足够快，则使用对称内核的权重对过滤器加权，然后通过扩展tf.keras.Layer（https://www.tensorflow.org/tutorials/eager/custom_layers）。问题是，到目前为止，我模仿tf.nn.conv2d函数的python代码比tf.nn.conv2d函数慢 100倍 ...

这是我用来模仿tf.nn.conv2d函数的代码。

import tensorflow as tf

def inner_conv_x(w, x, y, inp, filters, strides, f_sh, indices, updates):

    # IDEA: need to pass the kernel and elementwise multiple
    # inp[..] * filters * kernel[..]
    val = tf.reduce_sum(
        inp[y * strides[1]:f_sh[1] + y * strides[1], x * strides[2] : f_sh[2] + x * strides[2]] * \
        filters, axis=(0, 1, 2))

    idx = y * w + x
    indices = indices.write(idx, [y, x])
    updates = updates.write(idx, val)

    return w, x + 1, y, inp, filters, strides, f_sh, indices, updates


def inner_conv(h, y, w, inp, filters, strides, f_sh, indices, updates):
    cond2 = lambda width, step_x, *args: width > step_x

    _, x, y, _, _, _, _, indices, updates = tf.while_loop(
        cond2, inner_conv_x,
        [w, 0, y, inp, filters, strides, f_sh, indices, updates])

    return h, y + 1, w, inp, filters, strides, f_sh, indices, updates


def my_tf_conv2(inputs, filters, padding, strides):
    f_sh = tf.shape(filters)
    in_sh = tf.shape(inputs)
    _, s_h, s_w, _ = strides

    if padding.upper() == 'SAME':
        out_h = tf.cast(tf.ceil(in_sh[1] / s_h), tf.int32)
        out_w = tf.cast(tf.ceil(in_sh[2] / s_w), tf.int32)

        padding_h = tf.maximum((out_h - 1) * s_h + f_sh[1] - in_sh[1], 0)
        pad_h1 = padding_h // 2
        pad_h2 = padding_h - pad_h1

        padding_w = tf.maximum((out_w - 1) * s_w + f_sh[2] - in_sh[2], 0)
        pad_w1 = padding_w // 2
        pad_w2 = padding_w - pad_w1

        inputs = tf.pad(inputs, [[0, 0]] + [[pad_h1, pad_h2], [pad_w1, pad_w2]] + [[0, 0]])
    elif padding.upper() == 'VALID':
        out_h = tf.cast(tf.ceil((h - f_sh[1] + 1) / s_h), tf.int32)
        out_w = tf.cast(tf.ceil((w - f_sh[2] + 1) / s_w), tf.int32)

    cond1 = lambda height, step_y, *args: height > step_y

    res = []
    inputs = tf.unstack(inputs)
    for inp in inputs:

        y, x = 0, 0
        indices = tf.TensorArray(dtype=tf.int32, size=0, dynamic_size=True)
        updates = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
        inp = tf.expand_dims(inp, axis=-1)

        _, y, _, _, _, _, _, indices, updates = tf.while_loop(
            cond1,
            inner_conv, [out_h, y, out_w, inp, filters, strides, f_sh, indices, updates],
            parallel_iterations=2)

        res.append(tf.scatter_nd(
            indices.stack(),
            updates.stack(),
            shape=(out_h, out_w, f_sh[3])))

    res = tf.stack(res)

    return res

然后您可以对其进行如下测试：

my_result2 = my_tf_conv2(tf_inputs, tf_filters, padding="SAME", strides=[1, 2, 2, 1])

with tf.Session() as sess:
    sess.run(tf.initializers.global_variables())
    output = sess.run(my_result2)
    print(output)

在评论中，我解释了我的想法。事实是，它是如此之慢，以至于我还没有开始实现完整的PAC层...

我在想。你们中的某些人是否可以想到一种有效实现PAC层的方法（而不是慢100倍甚至更多，如果我继续朝这个方向实施的话）？我在考虑使用tf.nn.depthwise_conv2d吗？您认为我们只能使用CUDA而不是高级tensorflow python API来实现高效版本吗？

谢谢大家

如何实现像素自适应卷积层

0 个答案: