Question

In Deep Convolutional nets (for example, MNIST digit recognition using cnn), Will the trained network give correct output if provided with the input which is invert of the original image of digit which is used for training? By virtue of its design, It seems that it should be invariant with this operation as it is with translation.

Answer 1

In short - no. Convolutions/poolings make networks slightly invariant to translations, but such model (without anything else added) is still not invariant to rotations, inversions, reflections etc.

For inversions in particular, which I understand as changing colors from white to black and vice versa (new_color = 255 - previous) it is easy to show that activation functions behave differently. For example consider a relu activation which, after some convolutions gets signal "x" after linear "processing" image of number "4". If you completely flip the colors, this "x" might change sign and your neuron will either be inactive (if original x>0) or active (otherwise).

To better ilustrate this, lets take a look at easiest example of convolution, with 3x3 filter and 3x3 input (for simplicity I normalize [0,255] to [0,1])

1 1 1         1   1  1
1 0 1  (x)    1 -99  1   =   1 + 1 + 1 + 1 + 0 + 1 + 1 + 1 + 1 = 8
1 1 1         1   1  1

relu(8) = max(0, 8) = 8

0 0 0         1   1  1
0 1 0  (x)    1 -99  1   =   0 + 0 + 0 + 0 -99 + 0 + 0 + 0 + 0 = -99
0 0 0         1   1  1

relu(8) = max(0, -99) = 0

This is a very simple example, but this is a big difference which will be present in every single neuron in your model, thus as a consequence - whole behavior changes.

Rotations, symmetries are also affected in a similar manner. CNN is only invariant to small translations and very small rotations. All more "significant" changes will change behavior.

Image Recognition : Invariance for image inversion

1 个答案: