Question

我正在做一个教程（代码here）和视频here（13:00分钟）。

我唯一的改变是使用来自不同位置的mnist训练集（创建一个热门编码），但它无法正常工作。我在这个例子中直接复制粘贴了所有代码（mnist加载除外）。这是代码：

import theano
from theano import tensor as T
import numpy as np  
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST Original")
trX, teX, trY_digit, teY_digit = train_test_split(mnist.data, mnist.target, test_size=.4)

#Get one-hot encoding
enc = OneHotEncoder()
enc.fit([[n] for n in range(10)])
trY, teY = sparse_to_floatX(enc.transform(trY_digit[:,newaxis])), sparse_to_floatX(enc.transform(teY_digit[:,newaxis]))

def floatX(X):
    return np.asarray(X, dtype=theano.config.floatX)

def init_weights(shape):
    return theano.shared(floatX(np.random.randn(*shape) * 0.1))

def model(X, w):
    return T.nnet.softmax(T.dot(X, w))

X = T.fmatrix()
Y = T.fmatrix()

w = init_weights((784, 10))

py_x = model(X, w)
y_pred = T.argmax(py_x, axis=1)

cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
gradient = T.grad(cost=cost, wrt=w)
update = [[w, w - gradient * 0.05]]

train = theano.function(inputs=[X, Y], outputs=cost, updates=update, allow_input_downcast=True)
predict = theano.function(inputs=[X], outputs=y_pred, allow_input_downcast=True)

for i in range(10):
    print w.get_value()
    cost = train(trX, trY)
    print i, predict(teX)

权重向量更新一次，并在第二次更新时变为全部NaN。我是theano的新手，但我正在寻找解决这个问题的技巧，特别是如果有人已经完成了这个教程。

更新即可。看起来梯度就是问题。

当我添加此

时

the_grad = T.sum(gradient)
f_grad = theano.function(inputs=[X, Y], outputs=the_grad, allow_input_downcast=True)
print f_grad(trX, trY)

打印NaN。这似乎是T.grad的正确用法。

更新2。 当我将成本函数更改为：

cost = T.mean(T.sum(T.sqr(py_x - Y), axis=1), axis=0)

它现在正在工作，但我只有70％的准确率，这真的很糟糕。

更新3。 我下载了本教程中使用的MNIST数据，它的工作效率为92％。

我不确定为什么我的第一个mnist数据源因为交叉熵成本而死亡，然后使用均方误差成本函数执行非常差。

Theano - 逻辑回归实例权重向量变为NaN？

0 个答案: