神经网络在CIFAR-10数据集上表现不佳

时间:2017-10-25 19:09:15

标签: machine-learning tensorflow computer-vision deep-learning conv-neural-network

我一直试图在CIFAR-10数据集上实现CNN几天,我的测试集精度似乎没有超过10%,错误只是在69.07733左右。我调整模型几天但是徒劳无功。我还没有能够发现我出错的地方。请帮我识别模型中的故障。这是代码:

   
import os
import sys
import pickle
import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt

data_root = './cifar-10-batches-py'
train_data = np.ndarray(shape=(50000,3072), dtype=np.float32)
train_labels = np.ndarray(shape=(50000), dtype=np.float32)
num_images = 0
test_data = np.ndarray(shape=(10000,3072),dtype = np.float32)
test_labels = np.ndarray(shape=(10000),dtype=np.float32)
meta_data = {}

for file in os.listdir(data_root):
    file_path = os.path.join(data_root,file)
    with open(file_path,'rb') as f:
        temp = pickle.load(f,encoding ='bytes')
        if file == 'batches.meta':
            for i,j in enumerate(temp[b'label_names']):
                meta_data[i] = j
        if 'data_batch_' in file:
            for i in range(10000):
                train_data[num_images,:] = temp[b'data'][i]
                train_labels[num_images] = temp[b'labels'][i]
                num_images += 1
        if 'test_batch' in file:
            for i in range(10000):
                test_data[i,:] = temp[b'data'][i]
                test_labels[i] = temp[b'labels'][i]



'''         
print('meta: \n',meta_data)
train_data = train_data.reshape(50000,3,32,32).transpose(0,2,3,1)
print('\ntrain data: \n',train_data.shape,'\nLabels: \n',train_labels[0])
print('\ntest data: \n',test_data[0].shape,'\nLabels: \n',train_labels[0])'''


#accuracy function acc = (no. of correct prediction/total attempts) * 100
def accuracy(predictions, labels):
    return (100 * (np.sum(np.argmax(predictions,1)== np.argmax(labels, 1))/predictions.shape[0]))

#reformat the data
def reformat(data,labels):
    data = data.reshape(data.shape[0],3,32,32).transpose(0,2,3,1).astype(np.float32)
    labels = (np.arange(10) == labels[:,None]).astype(np.float32)
    return data,labels


train_data, train_labels = reformat(train_data,train_labels)
test_data, test_labels = reformat(test_data, test_labels)
print ('Train ',train_data[0][1])

plt.axis("off")
plt.imshow(train_data[1], interpolation = 'nearest')
plt.savefig("1.png")
plt.show()

'''
print("Train: \n",train_data.shape,test_data[0],"\nLabels: \n",train_labels.shape,train_labels[:11])
print("Test: \n",test_data.shape,test_data[0],"\nLabels: \n",test_labels.shape,test_labels[:11])'''

image_size = 32
num_channels = 3
batch_size = 30
patch_size = 5
depth = 64
num_hidden = 256
num_labels = 10

graph = tf.Graph()

with graph.as_default():

    #input data and labels
    train_input = tf.placeholder(tf.float32,shape=(batch_size,image_size,image_size,num_channels))
    train_output = tf.placeholder(tf.float32,shape=(batch_size,num_labels))
    test_input = tf.constant(test_data)

    #layer weights and biases
    layer_1_weights = tf.Variable(tf.truncated_normal([patch_size,patch_size,num_channels,depth]))
    layer_1_biases = tf.Variable(tf.zeros([depth]))

    layer_2_weights = tf.Variable(tf.truncated_normal([patch_size,patch_size,depth,depth]))
    layer_2_biases = tf.Variable(tf.constant(0.1, shape=[depth]))

    layer_3_weights = tf.Variable(tf.truncated_normal([64*64, num_hidden]))
    layer_3_biases = tf.Variable(tf.constant(0.1, shape=[num_hidden]))

    layer_4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels]))
    layer_4_biases = tf.Variable(tf.constant(0.1, shape=[num_labels]))

    def convnet(data):
        conv_1 = tf.nn.conv2d(data, layer_1_weights,[1,1,1,1], padding = 'SAME')
        hidden_1 = tf.nn.relu(conv_1+layer_1_biases)
        norm_1 = tf.nn.lrn(hidden_1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
        pool_1 = tf.nn.max_pool(norm_1,[1,2,2,1],[1,2,2,1], padding ='SAME')
        conv_2 = tf.nn.conv2d(pool_1,layer_2_weights,[1,1,1,1], padding = 'SAME')
        hidden_2 = tf.nn.relu(conv_2+layer_2_biases)
        norm_2 = tf.nn.lrn(hidden_2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
        pool_2 = tf.nn.max_pool(norm_2,[1,2,2,1],[1,2,2,1], padding ='SAME')
        shape = pool_2.get_shape().as_list()
        hidd2_trans = tf.reshape(pool_2,[shape[0],shape[1]*shape[2]*shape[3]])
        hidden_3 = tf.nn.relu(tf.matmul(hidd2_trans,layer_3_weights) + layer_3_biases)
        return tf.nn.relu(tf.matmul(hidden_3,layer_4_weights) + layer_4_biases)

    logits = convnet(train_input)
    loss = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=train_output, logits = logits))

    optimizer = tf.train.AdamOptimizer(1e-4).minimize(loss)

    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(convnet(test_input))


num_steps = 100000


with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized \n')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch = train_data[offset:(offset+batch_size),:,:,:]
        batch_labels = train_labels[offset:(offset+batch_size),:]
        feed_dict ={train_input: batch, train_output: batch_labels}
        _,l,prediction = session.run([optimizer, loss, train_prediction], feed_dict = feed_dict)
        if (step % 500 == 0):
            print("Loss at step %d: %f" %(step, l))
            print("Accuracy: %f" %(accuracy(prediction, batch_labels)))
    print("Test accuracy: %f" %(accuracy(session.run(test_prediction), test_labels)))

2 个答案:

答案 0 :(得分:0)

乍一看,我会说CNN的初始化是罪魁祸首。回转网是高度非凸空间中的优化算法,因此很大程度上依赖于仔细初始化而不会卡在局部最小值或鞍点上。查看xavier初始化以获取有关如何解决该问题的示例。

示例代码:

W = tf.get_variable("W", shape=[784, 256],
           initializer=tf.contrib.layers.xavier_initializer())

答案 1 :(得分:0)

问题是您的网络具有非常高的深度(两个层的过滤器数量= 64)。此外,您正在从头开始训练网络。你的CIFAR10数据集(50000张图片)非常少。此外,每个CIFAR10图像的尺寸仅为32x32x3。

我建议你做的几个备选方案是重新训练预先训练过的模型,即转移学习。

其他更好的选择是减少每层中的过滤器数量。通过这种方式,您将能够从头开始训练模型,并且速度也会更快。 (假设你没有GPU)。

接下来,您将使用本地响应规范化。我建议你删除这一层,并在预处理步骤中进行标准化。

接下来,如果您觉得学习没有完全恢复,请尝试稍微提高学习率并查看。

最后,只是为了减少代码中的某些操作,你正在重塑你的张量,然后在很多地方进行转置:

data.reshape(data.shape[0],3,32,32).transpose(0,2,3,1)

为什么不直接将它重塑成这样的东西?

data.reshape(data.shape[0], 32, 32, 3)

希望答案可以帮到你。