Question

我用scikit训练Naive Bayes模型 - 学习在我的web应用程序中对文章进行分类。为避免重复学习模型，我想保存模型并稍后将其部署到应用程序中。当我搜索此问题时，很多人推荐使用pickle库。

我有这个型号：

import pickle
import os
def custom_tokenizer (doc) :
    tokens = vect_tokenizer(doc)
    return [lemmatizer.lemmatize(token) for token in tokens]

tfidf = TfidfVectorizer(tokenizer = custom_tokenizer,stop_words = "english")
clf = MultinomialNB()

我已执行tfidf.fit_transform()并已培训clf。最后，我得到了一个模型并使用以下代码保存了clf分类器：

dest = os.path.join('classifier','pkl_object')
f = open(os.path.join(dest,'classifier.pkl'),'wb')
pickle.dump(best_classifier,f,protocol = 4)
f.close()

我还试图以这种方式将Vectorizer保存为文件。

f =  open(os.path.join(dest,'vect.pkl'),'wb')
pickle.dump(custom_tokenizer,f,protocol = 4)
pickle.dump(best_vector,f,protocol = 4)
f.close()

没有错误。但是当我试图加载文件时，会弹出此错误消息。

import pickle
import os

with open(os.path.join('pkl_object','classifier.pkl'),'rb') as file :
    clf = pickle.load(file)

with open(os.path.join('pkl_vect','vect.pkl'),'rb') as file:
    vect = pickle.load(file)

错误消息：

AttributeError                            Traceback (most recent call last)
<ipython-input-55-d4b562870a02> in <module>()
     11 
     12 with open(os.path.join('pkl_vect','vect.pkl'),'rb') as file:
---> 13     vect = pickle.load(file)
     14 
     15 '''

AttributeError: Can't get attribute 'custom_tokenizer' on <module '__main__'>

我认为pickle库没有正确存储功能的能力。如何将自定义TfidfVectorizer序列化为文件。

Answer 1

在第二个程序中还包括：

def custom_tokenizer (doc) :
    tokens = vect_tokenizer(doc)
    return [lemmatizer.lemmatize(token) for token in tokens]

becuase pickle实际上并不存储有关如何构造类/对象的信息，因为错误日志中的这一行显示AttributeError: Can't get attribute 'custom_tokenizer' on <module '__main__'>它不知道custom_tokenizer是什么。请参阅{{3}为了更好地理解。

保存并加载scikit-learn机器学习模型和功能

1 个答案: