使用训练好的模型来预测具有不同input_shape数据的类

时间:2019-11-15 17:10:54

标签: python tensorflow machine-learning keras nlp

我有一个保存的模型,该模型是我在一个小型文本(消息传递)数据语料库上训练的,而我正试图使用​​同一模型来预测另一个语料库的正面或负面情绪(即二进制分类)。我将NLP模型基于GOOGLE dev ML指南,您可以在此处进行查看(如果您认为它有用-我在所有情况下都使用了选项A)。

我一直收到输入形状错误,我知道该错误意味着我必须重新调整输入形状以适合预期的形状。但是,我要预测的数据不是这个大小。错误语句是:

ValueError: Error when checking input: expected dropout_8_input to have shape (519,) but got array with shape (184,)

模型之所以期望形状为(519,),是因为在训练期间,送入第一个辍学层(采用TfidfVectorized形式)的语料为print(x_train.shape) #(454, 519)

我是ML的新手,但我对优化模型后尝试预测的所有数据的形状应该与用于训练模型的数据具有相同的形状并没有任何意义。 有人遇到过与此类似的问题吗?在训练模型以预测不同大小的输入时,我缺少什么吗?还是我对模型如何用于类预测有误解?

我基于以下模型训练功能:

from tensorflow.python.keras import models
from tensorflow.python.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.python.keras.layers import Convolution2D, MaxPooling2D

def mlp_model(layers, units, dropout_rate, input_shape, num_classes):
    """Creates an instance of a multi-layer perceptron model.

    # Arguments
        layers: int, number of `Dense` layers in the model.
        units: int, output dimension of the layers.
        dropout_rate: float, percentage of input to drop at Dropout layers.
        input_shape: tuple, shape of input to the model.
        num_classes: int, number of output classes.

    # Returns
        An MLP model instance.
    """
    op_units, op_activation = _get_last_layer_units_and_activation(num_classes)
    model = models.Sequential()
    model.add(Dropout(rate=dropout_rate, input_shape=input_shape))

#     print(input_shape)

    for _ in range(layers-1):
        model.add(Dense(units=units, activation='relu'))
        model.add(Dropout(rate=dropout_rate))

    model.add(Dense(units=op_units, activation=op_activation))
    return mode





def train_ngram_model(data,
                      learning_rate=1e-3,
                      epochs=1000,
                      batch_size=128,
                      layers=2,
                      units=64,
                      dropout_rate=0.2):
    """Trains n-gram model on the given dataset.

    # Arguments
        data: tuples of training and test texts and labels.
        learning_rate: float, learning rate for training model.
        epochs: int, number of epochs.
        batch_size: int, number of samples per batch.
        layers: int, number of `Dense` layers in the model.
        units: int, output dimension of Dense layers in the model.
        dropout_rate: float: percentage of input to drop at Dropout layers.

    # Raises
        ValueError: If validation data has label values which were not seen
            in the training data.

    # Reference
        For tuning hyperparameters, please visit the following page for
        further explanation of each argument:
        https://developers.google.com/machine-learning/guides/text-classification/step-5
    """
    # Get the data.
    (train_texts, train_labels), (val_texts, val_labels) = data

    # Verify that validation labels are in the same range as training labels.
    num_classes = get_num_classes(train_labels)
    unexpected_labels = [v for v in val_labels if v not in range(num_classes)]
    if len(unexpected_labels):
        raise ValueError('Unexpected label values found in the validation set:'
                         ' {unexpected_labels}. Please make sure that the '
                         'labels in the validation set are in the same range '
                         'as training labels.'.format(
                             unexpected_labels=unexpected_labels))

    # Vectorize texts.
    x_train, x_val = ngram_vectorize(
        train_texts, train_labels, val_texts)

    # Create model instance.
    model = mlp_model(layers=layers,
                                  units=units,
                                  dropout_rate=dropout_rate,
                                  input_shape=x_train.shape[1:],
                                  num_classes=num_classes) 
                                # num_classes determine which activation fn to use

    # Compile model with learning parameters.
    if num_classes == 2:
        loss = 'binary_crossentropy'
    else:
        loss = 'sparse_categorical_crossentropy'
    optimizer = tf.keras.optimizers.Adam(lr=learning_rate)
    model.compile(optimizer=optimizer, loss=loss, metrics=['acc'])

    # Create callback for early stopping on validation loss. If the loss does
    # not decrease in two consecutive tries, stop training.
    callbacks = [tf.keras.callbacks.EarlyStopping(
        monitor='val_loss', patience=2)]

    # Train and validate model.
    history = model.fit(
            x_train,
            train_labels,
            epochs=epochs,
            callbacks=callbacks,
            validation_data=(x_val, val_labels),
            verbose=2,  # Logs once per epoch.
            batch_size=batch_size)

    # Print results.
    history = history.history
    print('Validation accuracy: {acc}, loss: {loss}'.format(
            acc=history['val_acc'][-1], loss=history['val_loss'][-1]))

    # Save model.
    model.save('MCTR2.h5')
    return history['val_acc'][-1], history['val_loss'][-1]

由此我得到的模型架构为:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dropout (Dropout)            (None, 519)               0         
_________________________________________________________________
dense (Dense)                (None, 64)                33280     
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
=================================================================
Total params: 33,345
Trainable params: 33,345
Non-trainable params: 0
_________________________________________________________________

1 个答案:

答案 0 :(得分:0)

要使尺寸在tensorflow中是可变的,需要将尺寸指定为None

第一个维度是batch_size,这就是为什么它通常总是None的原因,但是通常一批序列数据的形状为(batch_size, sequence_length, num_features)。因此,单个序列通常是2D,长度是可变的,但是每个“令牌”的特征数量是固定的。

您似乎正在输入模型1D向量,并且Dense层具有固定的输入形状。如果要建模可变长度序列,则必须使用可容纳该序列的层(例如卷积,LSTM)来构建模型。