我正在尝试使用Flask部署sklearn文本分类管道。当我尝试加载腌制的模型时,我一直收到标题中提到的错误。这是我的文件结构:
<?php
header('X-Powered-By: Riverside Rocks');
die();
?>
book_classifier.pkl是经过数据训练的以下管道的腌制版本:
webapp/
├── model/
│ └── book_classifier.pkl
├── templates/
│ └── main.html
└── app.py
└── preprocessing.py
下面是preprocessing.py的代码,其中具有必要的文本预处理步骤(即标记化,然后是您在上面看到的tfidf_vector和clean_transformer):
classifier = KNeighborsClassifier()
pipe = Pipeline([('clean_transformer', clean_transformer()),
('vectorizer', tfidf_vector),
('classifier', classifier)])
fitted_pipe = pipe.fit(X,y)
joblib.dump(fitted_pipe, 'book_classifier.pkl', compress=1)
最后,是app.py的代码:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.base import TransformerMixin
from sklearn.pipeline import Pipeline
import spacy
from spacy.lang.en import English
parser = English()
from spacy.lang.en.stop_words import STOP_WORDS
stop_words = spacy.lang.en.stop_words.STOP_WORDS
import string
punctuations = string.punctuation
# tokenizer
def spacy_tokenizer(sentence):
mytokens = parser(sentence)
mytokens = [ word.lemma_.lower().strip() if word.lemma_ != "-PRON-" else word.lower_ for word in mytokens ]
mytokens = [ word for word in mytokens if word not in stop_words and word not in punctuations ]
return mytokens
# vectorizers
bow_vector = CountVectorizer(tokenizer = spacy_tokenizer, ngram_range=(1,1))
tfidf_vector = TfidfVectorizer(tokenizer = spacy_tokenizer)
# transformer
def clean_text(text):
return text.strip().lower()
class clean_transformer(TransformerMixin):
def transform(self, X, **transform_params):
return [clean_text(text) for text in X]
def fit(self, X, y=None, **fit_params):
return self
def get_params(self, deep=True):
return {}
如前所述,错误发生在行import flask
import joblib
import pandas as pd
from preprocessing import *
model = joblib.load(open('model/book_classifier.pkl', 'rb'))
app = flask.Flask(__name__, template_folder='templates')
@app.route('/', methods=['GET', 'POST'])
def main():
if flask.request.method == 'GET':
return(flask.render_template('main.html'))
if flask.request.method == 'POST':
title = flask.request.form['title']
booktext = flask.request.form['booktext']
prediction = model.predict(booktext)
return flask.render_template('main.html',
original_input={'Book Title':title},
result=prediction), print(prediction)
if __name__ == '__main__':
app.run()
上。下面是完整的错误:
model = joblib.load(open('model/book_classifier.pkl', 'rb'))
我是Flask部署的新手,我不确定这是怎么回事。我很难解释错误消息。请注意,无论是从preprocessing.py导入还是将代码直接放在app.py中,问题仍然存在。任何帮助将不胜感激。