无法使此自动编码器网络正常运行(使用卷积和maxpool图层)

时间:2015-10-14 17:10:38

标签: neural-network dimensionality-reduction conv-neural-network autoencoder lasagne

Autoencoder网络似乎比普通的分类器MLP网络更棘手。在使用Lasagne几次尝试之后,我在重建输出中得到的所有内容都是MNIST数据库的所有图像的最佳模糊平均值,而不区分输入数字实际上是什么。< / p>

我选择的网络结构是以下级联层:

  1. 输入图层(28x28)
  2. 2D卷积层,滤波器大小7x7
  3. 最大合并图层,尺寸3x3,步幅2x2
  4. 密集(完全连接)平整层,10个单位(这是瓶颈)
  5. 密集(完全连接)层,121个单位
  6. 将图层重塑为11x11
  7. 2D卷积层,滤波器大小3x3
  8. 2D Upscaling图层因子2
  9. 2D卷积层,滤波器大小3x3
  10. 2D Upscaling图层因子2
  11. 2D卷积层,滤镜大小为5x5
  12. 功能最大池(从31x28x28到28x28)
  13. 所有2D卷积层都有解开的偏差,sigmoid激活和31个滤波器。

    所有完全连接的层都有sigmoid激活。

    使用的损失函数是squared error,更新函数是adagrad。用于学习的块的长度是100个样本,乘以1000个时期。

    为了完整起见,以下是我使用的代码:

    import theano.tensor as T
    import theano
    import sys
    sys.path.insert(0,'./Lasagne') # local checkout of Lasagne
    import lasagne
    from theano import pp
    from theano import function
    import gzip
    import numpy as np
    from sklearn.preprocessing import OneHotEncoder
    import matplotlib.pyplot as plt
    def load_mnist():
    
        def load_mnist_images(filename):
            with gzip.open(filename, 'rb') as f:
                data = np.frombuffer(f.read(), np.uint8, offset=16)
            # The inputs are vectors now, we reshape them to monochrome 2D images,
            # following the shape convention: (examples, channels, rows, columns)
            data = data.reshape(-1, 1, 28, 28)
            # The inputs come as bytes, we convert them to float32 in range [0,1].
            # (Actually to range [0, 255/256], for compatibility to the version
            # provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)
            return data / np.float32(256)
    
        def load_mnist_labels(filename):
            # Read the labels in Yann LeCun's binary format.
            with gzip.open(filename, 'rb') as f:
                data = np.frombuffer(f.read(), np.uint8, offset=8)
            # The labels are vectors of integers now, that's exactly what we want.
            return data
    
        X_train = load_mnist_images('train-images-idx3-ubyte.gz')
        y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')
        X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')
        y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')
        return X_train, y_train, X_test, y_test
    
    def plot_filters(conv_layer):
        W = conv_layer.get_params()[0]
        W_fn = theano.function([],W)
        params = W_fn()
        ks = np.squeeze(params)
        kstack = np.vstack(ks)
        plt.imshow(kstack,interpolation='none')
        plt.show()
    
    def main():
    
        #theano.config.exception_verbosity="high"
        #theano.config.optimizer='None'
    
        X_train, y_train, X_test, y_test = load_mnist()
        ohe = OneHotEncoder()
    
        y_train = ohe.fit_transform(np.expand_dims(y_train,1)).toarray()
        chunk_len = 100
        visamount = 10
        num_epochs = 1000
        num_filters=31
        dropout_p=.0
        print "X_train.shape",X_train.shape,"y_train.shape",y_train.shape
        input_var = T.tensor4('X')
        output_var = T.tensor4('X')
        conv_nonlinearity = lasagne.nonlinearities.sigmoid
        net = lasagne.layers.InputLayer((chunk_len,1,28,28), input_var)
        conv1 = net = lasagne.layers.Conv2DLayer(net,num_filters,(7,7),nonlinearity=conv_nonlinearity,untie_biases=True)
        net = lasagne.layers.MaxPool2DLayer(net,(3,3),stride=(2,2))
        net = lasagne.layers.DropoutLayer(net,p=dropout_p)
        #conv2_layer = lasagne.layers.Conv2DLayer(dropout_layer,num_filters,(3,3),nonlinearity=conv_nonlinearity)
        #pool2_layer = lasagne.layers.MaxPool2DLayer(conv2_layer,(3,3),stride=(2,2))
        net = lasagne.layers.DenseLayer(net,10,nonlinearity=lasagne.nonlinearities.sigmoid)
    
        #augment_layer1 = lasagne.layers.DenseLayer(reduction_layer,33,nonlinearity=lasagne.nonlinearities.sigmoid)
        net = lasagne.layers.DenseLayer(net,121,nonlinearity=lasagne.nonlinearities.sigmoid)
    
        net = lasagne.layers.ReshapeLayer(net,(chunk_len,1,11,11))
    
        net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
        net = lasagne.layers.Upscale2DLayer(net,2)
    
        net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
        #pool_after0 = lasagne.layers.MaxPool2DLayer(conv_after1,(3,3),stride=(2,2))
        net = lasagne.layers.Upscale2DLayer(net,2)
    
        net = lasagne.layers.DropoutLayer(net,p=dropout_p)
    
        #conv_after2 = lasagne.layers.Conv2DLayer(upscale_layer1,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
        #pool_after1 = lasagne.layers.MaxPool2DLayer(conv_after2,(3,3),stride=(1,1))
        #upscale_layer2 = lasagne.layers.Upscale2DLayer(pool_after1,4)
    
        net = lasagne.layers.Conv2DLayer(net,num_filters,(5,5),nonlinearity=conv_nonlinearity,untie_biases=True)
        net = lasagne.layers.FeaturePoolLayer(net,num_filters,pool_function=theano.tensor.max)
        print "output_shape:",lasagne.layers.get_output_shape(net)
        params = lasagne.layers.get_all_params(net, trainable=True)
        prediction = lasagne.layers.get_output(net)
        loss = lasagne.objectives.squared_error(prediction, output_var)
        #loss = lasagne.objectives.binary_crossentropy(prediction, output_var)
        aggregated_loss = lasagne.objectives.aggregate(loss)
        updates = lasagne.updates.adagrad(aggregated_loss,params)
        train_fn = theano.function([input_var, output_var], loss, updates=updates)
    
        test_prediction = lasagne.layers.get_output(net, deterministic=True)
        predict_fn = theano.function([input_var], test_prediction)
    
        print "starting training..."
        for epoch in range(num_epochs):
            selected = list(set(np.random.random_integers(0,59999,chunk_len*4)))[:chunk_len]
            X_train_sub = X_train[selected,:]
            _loss = train_fn(X_train_sub, X_train_sub)
            print("Epoch %d: Loss %g" % (epoch + 1, np.sum(_loss) / len(X_train)))
            """
            chunk = X_train[0:chunk_len,:,:,:]
            result = predict_fn(chunk)
            vis1 = np.hstack([chunk[j,0,:,:] for j in range(visamount)])
            vis2 = np.hstack([result[j,0,:,:] for j in range(visamount)])
            plt.imshow(np.vstack([vis1,vis2]))
            plt.show()
            """
        print "done."
    
        chunk = X_train[0:chunk_len,:,:,:]
        result = predict_fn(chunk)
        print "chunk.shape",chunk.shape
        print "result.shape",result.shape
        plot_filters(conv1)
        for i in range(chunk_len/visamount):
            vis1 = np.hstack([chunk[i*visamount+j,0,:,:] for j in range(visamount)])
            vis2 = np.hstack([result[i*visamount+j,0,:,:] for j in range(visamount)])
            plt.imshow(np.vstack([vis1,vis2]))
            plt.show()
        import ipdb; ipdb.set_trace()
    
    if __name__ == "__main__":
        main()
    

    关于如何改进此网络以获得合理运行的自动编码器的任何想法?

0 个答案:

没有答案