我要保存并加载count向量化器词汇。这是我的代码
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500)
Cv_vec = cv.fit(X['review'])
X_cv=Cv_vec.transform(X['review']).toarray()
dictionary_filepath='CV_dict'
pickle.dump(Cv_vec.vocabulary_, open(dictionary_filepath, 'w'))
它告诉我
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-407-3a9b06f969a9> in <module>()
1 dictionary_filepath='CV_dict'
----> 2 pickle.dump(Cv_vec.vocabulary_, open(dictionary_filepath, 'w'))
TypeError: write() argument must be str, not bytes
我想保存计数矢量化器的词汇表并加载它。有人可以帮我吗?
答案 0 :(得分:0)
拾取对象时,以二进制模式打开文件。并尝试使用context manager,即
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500)
Cv_vec = cv.fit(X['review'])
X_cv=Cv_vec.transform(X['review']).toarray()
dictionary_filepath='CV_dict'
with open('CV_dict.pkl', 'wb') as fout:
pickle.dump(Cv_vec.vocabulary_, fout)