我按照本教程进行情感分析: https://stackabuse.com/python-for-nlp-sentiment-analysis-with-scikit-learn/
但是我不是专业人士,所以我不了解每个步骤的详细信息。 现在,我想使用本教程将其应用于新数据: https://stackabuse.com/scikit-learn-save-and-restore-models/
但要点
score = pickle_model.score(Xtest, Ytest)
我收到“值”错误:无法从字符串转换为浮点“正”(正是前面进行的情感分析的标签)。令我惊讶的是,即使使用X_train和y_train(来自第一个教程),也会发生错误,但是
text_classifier.fit(X_train, y_train)
工作正常,没有任何错误。因此,我假设fit()方法所做的事情是score()方法做不到的,这会造成问题。但是,我不知道如何解决它。
这是完整的错误消息:
ValueError跟踪(最近一次通话最后一次)
<ipython-input-210-070f6faef44c> in <module>
34 print(len(X_train))
35 print(len(y_train))
---> 36 score = pickle_model.score(X_train, y_train)
37 print("Test score: {0:.2f} %".format(100 * score))
38
~\Anaconda3\lib\site-packages\sklearn\base.py in score(self, X, y, sample_weight)
408 y_pred = self.predict(X)
409 # XXX: Remove the check in 0.23
--> 410 y_type, _, _, _ = _check_reg_targets(y, y_pred, None)
411 if y_type == 'continuous-multioutput':
412 warnings.warn("The default value of multioutput (not exposed in "
~\Anaconda3\lib\site-packages\sklearn\metrics\regression.py in _check_reg_targets(y_true, y_pred, multioutput)
76 """
77 check_consistent_length(y_true, y_pred)
---> 78 y_true = check_array(y_true, ensure_2d=False)
79 y_pred = check_array(y_pred, ensure_2d=False)
80
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
494 try:
495 warnings.simplefilter('error', ComplexWarning)
--> 496 array = np.asarray(array, dtype=dtype, order=order)
497 except ComplexWarning:
498 raise ValueError("Complex data not supported\n"
~\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
536
537 """
--> 538 return array(a, dtype, copy=False, order=order)
539
540
ValueError:无法将字符串转换为float:“正”
这是错误发生的代码段:
vectorizer = TfidfVectorizer (max_features=2500, min_df=1, max_df=1, stop_words=stopwords.words('english'))
chat_data = vectorizer.fit_transform(chat_data).toarray()
X_train, X_test, y_train, y_test = train_test_split(chat_data, chat_labels, test_size=0.2, random_state=0)
text_classifier = RandomForestClassifier(n_estimators=200, random_state=0)
text_classifier.fit(X_train, y_train)
predictions = text_classifier.predict(X_test)
X_train = np.array(X_train).reshape((-1,1))
y_train = np.array(y_train).reshape((-1,1))
print(len(X_train))
print(len(y_train))
score = pickle_model.score(X_train, y_train)
print("Test score: {0:.2f} %".format(100 * score))