持久化的sklearn.feature_extraction.text.TfidfVectorizer出错

时间:2016-05-26 21:08:36

标签: python scikit-learn joblib

我使用模块joblib持久化了一个TfidfVectorizer。我通过fit_transform方法运行的对象是一个字符串列表。 得到的矩阵的维数为263744列。

我正在通过transform方法运行字符串列表,我收到以下错误。

任何线索?

File "/usr/local/lib/python2.7/dist-      packages/sklearn/feature_extraction/text.py", 
line 1334, in transform
return self._tfidf.transform(X, copy=False)
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", 
line 1037, in transform
X = X * self._idf_diag

File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/base.py", line    
318, in __mul__
return self._mul_sparse_matrix(other)
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.py",
line 487, in _mul_sparse_matrix
other = self.__class__(other)  # convert to this format
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.py",
line 31, in __init__
arg1 = arg1.asformat(self.format)
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/base.py", 
line 219, in asformat
return getattr(self,'to' + format)()
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/dia.py", 
line 241, in tocsr
return self.tocoo().tocsr()
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/dia.py", 
line 249, in tocoo

num_offsets, offset_len = self.data.shape
AttributeError: 'NDArrayWrapper' object has no attribute 'shape'

1 个答案:

答案 0 :(得分:0)

假设您将经过训练的变压器或管道保存到磁盘,然后在看到错误之前重新加载它,您可以:

  1. 尝试使用 compress 关键字参数参数将原始(工作)对象保存到joblib.dump,整数值大于0:

    _ = joblib.dump(python_object, persisted_file_name, compress=3)
    
  2. 如果要将持久文件移动到新位置,请执行 一定要复制所有文件。如果它很大,joblib会 把它分开,例如:

    persisted_model.joblib.pkl
    persisted_model.joblib.pkl_01.npy
    persisted_model.joblib.pkl_02.npy
    
  3. joblib docs