我对Sklearn库很陌生,所以我希望可以从这里获得一些帮助。我花了几天时间研究了类似的主题,以解决此问题,例如Dimension error(在本帖子中被标记为同一主题),但是该帖子的全部内容是训练和测试模型,但我的训练有素的新数据模型,因此有所不同。
我已经使用朴素贝叶斯模型创建了一个情感脚本。我已经用处理过的文本数据训练了我的模型并进行了测试,但是现在我想在其他处理过的数据上测试对模型的预测,无论我做什么,我都会不断收到此错误:
ValueError Traceback (most recent call last)
<ipython-input-112-213cbee0df96> in <module>
16 pickle.dump(clf, open(filename, 'wb'))
17 loaded_model = pickle.load(open(filename, 'rb'))
---> 18 predicted= loaded_model.predict(text_counts)
~\Anaconda3\lib\site-packages\sklearn\naive_bayes.py in predict(self, X)
64 Predicted target values for X
65 """
---> 66 jll = self._joint_log_likelihood(X)
67 return self.classes_[np.argmax(jll, axis=1)]
68
~\Anaconda3\lib\site-packages\sklearn\naive_bayes.py in _joint_log_likelihood(self, X)
729
730 X = check_array(X, accept_sparse='csr')
--> 731 return (safe_sparse_dot(X, self.feature_log_prob_.T) +
732 self.class_log_prior_)
733
~\Anaconda3\lib\site-packages\sklearn\utils\extmath.py in safe_sparse_dot(a, b, dense_output)
166 """
167 if sparse.issparse(a) or sparse.issparse(b):
--> 168 ret = a * b
169 if dense_output and hasattr(ret, "toarray"):
170 ret = ret.toarray()
~\Anaconda3\lib\site-packages\scipy\sparse\base.py in __mul__(self, other)
513
514 if other.shape[0] != self.shape[1]:
--> 515 raise ValueError('dimension mismatch')
516
517 result = self._mul_multivector(np.asarray(other))
ValueError: dimension mismatch
谁能告诉我我做错了什么? 这是我的完整脚本:
import pandas as pd
from sklearn.feature_extraction.text
import CountVectorizer
import pickle
from nltk.tokenize import RegexpTokenizer
data=pd.read_csv('airlinee_corrected.csv')
new = data[['Corrected_Tweet']].copy()
new = new.loc[0:10]
token = RegexpTokenizer(r'[a-zA-Z0-9]+')
cv = CountVectorizer(lowercase=True,stop_words='english',ngram_range = (1,1),tokenizer = token.tokenize)
text_counts= cv.fit_transform(new['Corrected_Tweet'])
filename = 'finalized_model_pickle.sav'
pickle.dump(clf, open(filename, 'wb'))
loaded_model = pickle.load(open(filename, 'rb'))
predicted= loaded_model.predict(text_counts)
这是我用来训练模型的数据集,并且我也将此数据用作“新”数据来测试模型: airlinee_corrected.csv