Question

我是theano的新手。我已经学习了基础知识并尝试实现简单模型（Logistic回归等）。该模型非常简单，具有784（28 * 28）个输入单元和10个单位softmax非线性（在MNIST数据集上进行训练）。我使用binary_crossentropy作为损失函数并使用L2 Regularizer来防止过度拟合。但似乎模型仍然过度拟合（通过查看模型的权重;如下所示）。我尝试更改正则化参数（lambda），但没有任何工作。我哪里做错了？提前谢谢。

# theano stuff
from theano import shared, function, pp
import theano.tensor as T
import numpy as np
import matplotlib.pyplot as plt
n_feat = 28*28
m_sample = 60000
n_class = 10
W_shape = (n_class, n_feat)
B_shape = (1, n_class)
W_param = np.random.random(W_shape)
B_param = np.random.random(B_shape)

W = shared(W_param, name='W', borrow=True)
B = shared(B_param, name='B', borrow=True, broadcastable=(True, False))
X = T.dmatrix('X') # has to be of (mxn)
O = T.nnet.softmax(X.dot(W.transpose())+B)
prediction = T.argmax(O, axis=1)
L = T.dmatrix('L')
lam = 0.05 # regularization parameter lambda

# loss_meansqr = (((O-L)**2).mean()).mean()
# loss_meansqr_reg = (((O-L)**2).mean()).mean() + lam *((W**2).mean()+(B**2).mean())
# loss_binxent = T.nnet.binary_crossentropy(O,L).mean()

loss_binxent_reg = T.nnet.binary_crossentropy(O,L).mean() + lam*((W**2).mean()+(B**2).mean()) # i'm using this one
loss = loss_binxent_reg
gW = T.grad(loss, W)
gB = T.grad(loss, B)
lr = T.dscalar('lr')
upds = [(W, W-lr*gW), (B, B-lr*gB)]
print 'Compiling functions...'
train = function([X,L,lr], [loss], updates=upds)
predict = function([X],prediction)
print 'Functions compiled'

重量看起来像这样 The weights of the model

Answer 1

不确定这是否是导致问题的，但损失函数不应该是分类交叉熵，而不是二元交叉熵？

MNIST的任务是将每个图像分类为一个类;图像不能属于许多类。当一个项目属于多个类别时，二元交叉熵是适当的损失，当一个项目只属于一个类别时，分类交叉熵是适当的损失。

我还建议在没有任何正规化的情况下尝试这一点（在测试时完全从损失函数中删除该组件），并确保您的学习率足够小（例如0.001应该有效）。

Theano的Logistic回归

1 个答案: