烧瓶中加载的keras模型始终预测相同的类

时间:2019-05-06 11:27:06

标签: python tensorflow machine-learning flask keras

奇怪的事情正在发生在我身上。我使用keras训练了一种情感分析模型,如下所示:

max_fatures = 2000
tokenizer = Tokenizer(num_words=max_fatures, split=' ')
tokenizer.fit_on_texts(data)
X = tokenizer.texts_to_sequences(data)
X = pad_sequences(X)

with open('tokenizer.pkl', 'wb') as fid:
    _pickle.dump(tokenizer, fid)

le = LabelEncoder()
le.fit(["pos", "neg"])
y = le.transform(data_labels)
y = keras.utils.to_categorical(y)

embed_dim = 128
lstm_out = 196

model = Sequential()
model.add(Embedding(max_fatures, embed_dim, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

batch_size = 32
model.fit(X, y, epochs=10, batch_size=batch_size, verbose=2)

model.save('deep.h5')

当我将其加载到另一个python文件中时,一切都很好。但是,当我将其加载到flask Web应用程序中时,所有预测的类都是肯定的。怎么了?这是我在flask Web应用程序中使用的代码:

with open('./resources/model/tokenizer.pkl', 'rb') as handle:
    keras_tokenizer = _pickle.load(handle)

K.clear_session()
model = load_model('./resources/model/deep.h5')
model._make_predict_function()
session = K.get_session()
global graph
graph = tf.get_default_graph()
graph.finalize()

stop_words = []

with open('./resources/stopwords.txt', encoding="utf8") as f:
    stop_words = f.read().splitlines()

normalizer = Normalizer()
stemmer = Stemmer()
tokenizer = RegexpTokenizer(r'\w+')


def predict_class(text):
    tokens = tokenizer.tokenize(text)
    temp = ''

    for token in tokens:
        if token in stop_words:
            continue

        token = normalizer.normalize(token)
        token = stemmer.stem(token)
        temp += token + ' '

    if not temp.strip():
        return None

    text = keras_tokenizer.texts_to_sequences(temp.strip())
    text = pad_sequences(text, maxlen=41)

    le = LabelEncoder()
    le.fit(["pos", "neg"])

    with session.as_default():
        with graph.as_default():
            sentiment = model.predict_classes(text)
            return le.inverse_transform(sentiment)[0]

2 个答案:

答案 0 :(得分:0)

您正在保存模型架构,但不是权重!

鉴于您正在使用Keras及其令牌生成器,我发现加载和重用模型的最佳方法是使用json表示架构和令牌生成器,并使用h5保存权重:

def save(model):
    # Save the trained weights
    model.save_weights('model_weights.h5')

    # Save the model architecture
    with open('model_architecture.json', 'w') as f:
        f.write(model.to_json())

    # Save the tokenizer
    with open('tokenizer.json', 'w') as f:
        f.write(tokenizer.to_json())

然后在烧瓶应用程序中按以下方式加载它们:

def models():
    with open('models/tokenizer.json') as f:
        tokenizer = tokenizer_from_json(f.read())

    # Model reconstruction from JSON file
    with open('models/model_architecture.json', 'r') as f:
        model = model_from_json(f.read())

    # Load weights into the new model
    model.load_weights('models/model_weights.h5')

    return model, tokenizer

答案 1 :(得分:0)

是的,我遇到了同样的问题。但就我而言,我的预测是正确的。我认为具有模型架构和权重的“.h5”文件是不够的,您需要使用分词器,因为它包含所有唯一标记的词索引或模型训练所使用的词。

因此,我强烈推荐 (Eudald Arranz)[https://stackoverflow.com/users/11153431/eudald-arranz] 在此线程上的最后一篇文章 - 以 JSON 格式保存权重和模型架构.

发布:2-Delete your ~/.Android* folders (c:\users\ur user).

因为这实际上对我有用。

谢谢,尤达尔