Question

在@IVlad给我真正有用的反馈后，我尝试修改我的代码，修改后的部分看起来像：

syn0 = (2*np.random.random((784,len(train_sample))) - 1)/8
syn1 = (2*np.random.random((len(train_sample),10)) - 1)/8


for i in xrange(10000):
    #forward propagation
    l0=train_sample
    l1=nonlin(np.dot(l0, syn0))
    l2=nonlin(np.dot(l1, syn1))

    #calculate error
    l2_error=train_tag_bool-l2

    if (i% 1000) == 0:
        print "Error:" + str(np.mean(np.abs(l2_error)))
    #apply sigmoid to the error 
    l2_delta = l2_error*nonlin(l2,deriv=True)

    l1_error = l2_delta.dot(syn1.T)
    l1_delta = l1_error * nonlin(l1,deriv=True)
    #update weights

    syn1 += alpha* (l1.T.dot(l2_delta) - beta*syn1)
    syn0 += alpha* (l0.T.dot(l1_delta) - beta*syn0)

请注意，标签（真实标签）现在位于＆lt; 3000 x 10＆gt;的矩阵中，每行是一个样本，十列描述每个样本代表的数字。（train_tag_bool，现在考虑一下，它不是真的是布尔格式，所以命名有点不好，但为了讨论起见，我现在就这样保留它。）

在这个项目中，我只在输入和输出层之间使用一个隐藏层，希望它足以完成作业。我已经应用了学习率和体重衰减，以及使初始权重更小。

我在计算错误率时使用了网站上的代码，这是

np.mean(np.abs(l2_error))

结果是0.1。我不知道该从这里拿什么。

此外，我进入了l2层（据称是提供预测的输出层），并且值都非常小（对于每个样本的最大值，<10 ^ -9，最小值可以达到10 ^ -85）。这只是经过5次迭代之后，但我怀疑如果我为1k循环或更多循环运行它会有什么不同。如果我返回每行的最大值，它总是第9个元素（表示数字'9'），这是完全错误的。

我再次陷入这个问题。溢出问题一直是我整个ML体验的最大挑战（当时的MATLAB，而不是Numpy），我还没有找到解决它的方法......

train_tag_bool代码：

train_tag_bool=np.array([[0]*10]*len(train_tag)).astype('float64')
for i in range(len(train_tag)):
    if train_tag[i]==0:
        train_tag_bool[i][0]=1
    elif train_tag[i]==1:
        train_tag_bool[i][1]=1
    elif train_tag[i]==2:
        train_tag_bool[i][2]=1
    elif train_tag[i]==3:
        train_tag_bool[i][3]=1
    elif train_tag[i]==4:
        train_tag_bool[i][4]=1
    elif train_tag[i]==5:
        train_tag_bool[i][5]=1
    elif train_tag[i]==6:
        train_tag_bool[i][6]=1
    elif train_tag[i]==7:
        train_tag_bool[i][7]=1
    elif train_tag[i]==8:
        train_tag_bool[i][8]=1
    elif train_tag[i]==9:
        train_tag_bool[i][9]=1

蛮力，我知道，但这是我现在最不关心的问题。结果是一个3000 x 10矩阵，1对应于每个样本的数字。第一个元素代表数字0，最后一个代表9

离。 [0 0 0 0 0 0 1 0 0 0]代表6，[1 0 0 0 0 0 0 0 0 0]代表0。

原始代码：

import cPickle, gzip
import numpy as np

#from deeplearning.net
# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()





#sigmoid function
def nonlin(x, deriv=False):
    if (deriv ==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

#seed random numbers to make calculation
#deterministic (just a good practice)

np.random.seed(1)




#need to decrease the sample size or else computer dies
train_sample=train_set[0][0:3000]
train_tag=train_set[1][0:3000]
train_tag=train_tag.reshape(len(train_tag), 1)

#train_set's dimension for the pixels are 50000(samples) x 784 (28x28 for each sample)
#therefore the coefficients should be 784x50000 to make the hidden layer 50k x 50k

syn0 = 2*np.random.random((784,len(train_sample))) - 1
syn1 = 2*np.random.random((len(train_sample),1)) - 1


for i in xrange(10000):
    #forward propagation
    l0=train_sample
    l1=nonlin(np.dot(l0, syn0))
    l2=nonlin(np.dot(l1, syn1))

    #calculate error
    l2_error=train_tag-l2

    if (i% 1000) == 0:
        print "Error:" + str(np.mean(np.abs(l2_error)))
    #apply sigmoid to the error 
    l2_delta = l2_error*nonlin(l2,deriv=True)

    l1_error = l2_delta.dot(syn1.T)
    l1_delta = l1_error * nonlin(l1,deriv=True)
    #update weights

    syn1 += l1.T.dot(l2_delta)
    syn0 += l0.T.dot(l1_delta)

参考：

http://iamtrask.github.io/2015/07/12/basic-python-network/

http://yann.lecun.com/exdb/mnist/

Answer 1

我目前无法运行代码，但有一些事情很突出。我很惊讶它甚至可以解决博客上使用的玩具问题。

在我们开始之前，你需要更多输出神经元：确切地说是10个。

syn1 = 2*np.random.random((len(train_sample), 10)) - 1

您的标签（y）最好使用10长度数组，其中1位于正确数字位置，0位于其他位置。

首先，我一直默认尝试的一件事是尽可能使用float64 ......几乎不会改变任何东西，所以我不确定你是否应该养成这个习惯。可能不是。

其次，该代码没有您可以设置的学习率。这意味着学习率隐含1，这对于您的问题非常重要，人们使用0.01甚至更少。要添加学习率alpha，请执行以下操作：

syn1 += alpha * l1.T.dot(l2_delta)
syn0 += alpha * l0.T.dot(l1_delta)

并将其设置为最多0.01。你必须摆弄它以获得最佳效果。

第三，通常用较小的权重初始化网络会更好。 [0, 1)可能太大了。尝试：

syn0 = (np.random.random((784,len(train_sample))) - 0.5) / 4
syn1 = (np.random.random((len(train_sample),1)) - 0.5) / 4

如果您感兴趣，可以搜索更多涉及的初始化方案，但我已经得到了不错的结果。

第四，正规化。最容易实现的可能是weight decay。实施重量衰减lambda可以这样做：

syn1 += alpha * l1.T.dot(l2_delta) - alpha * lambda * syn1
syn0 += alpha * l0.T.dot(l1_delta) - alpha * lambda * syn0

常用值也是< 0.1甚至是< 0.01。

Dropout也可以提供帮助，但在我看来，如果你刚刚开始实施和理解它会更难。对于更深的网络AFAIK，它也更有用。所以也许最后留下这个。

第五，也许还可以使用动量（explained in the weight decay link），这可以减少网络的学习时间。还要调整迭代次数：你不需要太多，但也不要太少。

第六，查看输出图层的softmax。

第七，查看tanh而不是当前的nonlin sigmoid函数。

如果逐步应用这些，您应该开始获得一些有意义的结果。我认为正则化和较小的初始权重应该有助于溢出错误。

<强>更新

我已经改变了这样的代码。仅在100个训练时期之后，准确度为84.79%。几乎没有调整任何东西也不错。

我添加了偏置神经元，动量，重量衰减，使用更少的隐藏单位（对你所拥有的太慢），改为tanh函数和其他一些。

您应该可以从这里进一步调整它。我使用Python 3.4，所以我不得不改变一些东西来让它运行，但它并不重要。

import pickle, gzip
import numpy as np

#from deeplearning.net
# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
f.close()





#sigmoid function
def nonlin(x, deriv=False):
    if (deriv ==True):
        return 1-x*x
    return np.tanh(x)

#seed random numbers to make calculation
#deterministic (just a good practice)

np.random.seed(1)

def make_proper_pairs_from_set(data_set):
    data_set_x, data_set_y = data_set

    data_set_y = np.eye(10)[:, data_set_y].T

    return data_set_x, data_set_y


train_x, train_y = make_proper_pairs_from_set(train_set)
train_x = train_x
train_y = train_y

test_x, test_y = make_proper_pairs_from_set(test_set)

print(len(train_y))

#train_set's dimension for the pixels are 50000(samples) x 784 (28x28 for each sample)
#therefore the coefficients should be 784x50000 to make the hidden layer 50k x 50k

# changed to 200 hidden neurons, should be plenty
syn0 = (2*np.random.random((785,200)) - 1) / 10
syn1 = (2*np.random.random((201,10)) - 1) / 10

velocities0 = np.zeros(syn0.shape)
velocities1 = np.zeros(syn1.shape)

alpha = 0.01
beta = 0.0001
momentum = 0.99

m = len(train_x) # number of training samples

# moved the forward propagation to a function and added bias neurons
def forward_prop(set_x, m):

    l0 = np.c_[np.ones((m, 1)), set_x]

    l1 = nonlin(np.dot(l0, syn0))
    l1 = np.c_[np.ones((m, 1)), l1]

    l2 = nonlin(np.dot(l1, syn1))


    return l0, l1, l2, l2.argmax(axis=1)

num_epochs = 100
for i in range(num_epochs):
    # forward propagation

    l0, l1, l2, _ = forward_prop(train_x, m)

    # calculate error
    l2_error = l2 - train_y


    print("Error " + str(i) + ": " + str(np.mean(np.abs(l2_error))))
    # apply sigmoid to the error 
    l2_delta = l2_error * nonlin(l2,deriv=True)

    l1_error = l2_delta.dot(syn1.T)
    l1_delta = l1_error * nonlin(l1,deriv=True)
    l1_delta = l1_delta[:, 1:]

    # update weights
    # divide gradients by the number of samples
    grad0 = l0.T.dot(l1_delta) / m
    grad1 = l1.T.dot(l2_delta) / m

    v0 = velocities0
    v1 = velocities1

    velocities0 = velocities0 * momentum - alpha * grad0
    velocities1 = velocities1 * momentum - alpha * grad1


    # divide regularization by number of samples
    # because L2 regularization reduces to this
    syn1 += -v1 * momentum + (1 + momentum) * velocities1 - alpha * beta * syn1 / m
    syn0 += -v0 * momentum + (1 + momentum) * velocities0 - alpha * beta * syn0 / m



# find accuracy on test set

predictions = []
corrects = []
for i in range(len(test_x)): # you can eliminate this loop too with a bit of work, but this part is very fast anyway
    _, _, _, rez = forward_prop([test_x[i, :]], 1)

    predictions.append(rez[0])
    corrects.append(test_y[i].argmax())

predictions = np.array(predictions)
corrects = np.array(corrects)

print(np.sum(predictions == corrects) / len(test_x))

更新2：

如果您将学习率提高到0.05，将时期提高到1000，则会获得95.43%的准确率。

使用当前时间对随机数生成器进行播种，添加更多隐藏神经元（或隐藏层）以及更多参数调整可以使这个简单模型达到约98%精度AFAIK。问题是训练很慢。

此外，这种方法并不真实。我优化了参数以提高测试集的准确性，因此我可能会过度拟合测试集。您应该使用交叉验证或验证集。

无论如何，正如您所看到的，没有溢出错误。如果您想更详细地讨论事情，请随时给我发电子邮件（个人资料中的地址）。

MNIST上的神经网络 - 不期望结果

离。 [0 0 0 0 0 0 1 0 0 0]代表6，[1 0 0 0 0 0 0 0 0 0]代表0。

1 个答案: