Question

我正在尝试使用Keras和Tensorflow解决文本分类问题，并且看到一些我想理解其原因的过度拟合问题。具体来说，我的任务是根据文本输入来预测文本（当然要减去URL）在一系列候选URL中的哪个将包含指向该URL的链接。

我最初尝试了一个简单的FastText分类器。这样做行之有效，分别获得了60％和40％的培训和测试准确率（大约有20种不同的类别）。

但是，我随后尝试制作自己的模型，即使使用非常简单的模型，也获得了约97％的训练精度。我目前正在使用Dropout进行正则化，但还尝试了权重衰减的结果类似。

我对神经网络和Keras / Tensorflow还是很陌生，所以如果这只是我的愚蠢错误，请提前道歉。

下面是我的网络：

from keras.layers import (Dense, Dropout, Embedding, GlobalAveragePooling1D,
                          Input, Bidirectional, Activation, Reshape)
from keras.layers.merge import concatenate
from keras.layers.recurrent import LSTM
from keras.models import Model, Sequential
from keras.optimizers import SGD
from keras.regularizers import l1_l2
from keras.utils import plot_model

max_features = 20000
maxlen = 30
embedding_dims = 50

visible = Input(shape=(30,), name="Input")
embedding = Embedding(max_features+1,
                      embedding_dims,
                      input_length=maxlen,
                      name="Embedding")(visible)

embedding = Dropout(rate=0.5)(embedding)

transformed = Reshape((-1,))(embedding)
representation = Dense(50, name="Representation")(transformed)
representation = Dropout(rate=0.5)(representation)
output = Dense(n_labels, activation='softmax', name="Output")(representation)
model = Model(inputs=visible, outputs=output)

sgd = SGD(lr=0.01, momentum=0.8, decay=0.0, nesterov=False)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

这是快速文本实现：

    max_features = 20000
    maxlen = 30
    embedding_dims = 50

    model = Sequential()
    # we start off with an efficient embedding layer which maps
    # our vocab indices into embedding_dims dimensions
    model.add(Embedding(max_features+1,
                        embedding_dims,
                        input_length=maxlen))

    model.add(Dropout(rate=0.5))
    # we add a GlobalAveragePooling1D, which will average the embeddings
    # of all words in the document
    model.add(GlobalAveragePooling1D())
    model.add(Dropout(rate=0.5))

    # We project onto a single unit output layer, and squash it with a sigmoid:
    model.add(Dense(n_labels, activation='softmax'))

    sgd = SGD(lr=0.01, momentum=0.8, decay=0.0, nesterov=False)

    model.compile(loss='categorical_crossentropy',
                  optimizer=sgd,
                  metrics=['accuracy'])
    return model

避免对顺序数据进行过度拟合？

0 个答案: