Python:无法从String转换为Float

时间:2019-12-19 15:04:31

标签: python scikit-learn sentiment-analysis

我按照本教程进行情感分析: https://stackabuse.com/python-for-nlp-sentiment-analysis-with-scikit-learn/

但是我不是专业人士,所以我不了解每个步骤的详细信息。 现在,我想使用本教程将其应用于新数据: https://stackabuse.com/scikit-learn-save-and-restore-models/

但要点

  score = pickle_model.score(Xtest, Ytest)

我收到“值”错误:无法从字符串转换为浮点“正”(正是前面进行的情感分析的标签)。令我惊讶的是,即使使用X_train和y_train(来自第一个教程),也会发生错误,但是

text_classifier.fit(X_train, y_train)

工作正常,没有任何错误。因此,我假设fit()方法所做的事情是score()方法做不到的,这会造成问题。但是,我不知道如何解决它。

这是完整的错误消息:

ValueError跟踪(最近一次通话最后一次)

<ipython-input-210-070f6faef44c> in <module>
     34 print(len(X_train))
     35 print(len(y_train))
---> 36 score = pickle_model.score(X_train, y_train)
     37 print("Test score: {0:.2f} %".format(100 * score))
     38 

~\Anaconda3\lib\site-packages\sklearn\base.py in score(self, X, y, sample_weight)
    408         y_pred = self.predict(X)
    409         # XXX: Remove the check in 0.23
--> 410         y_type, _, _, _ = _check_reg_targets(y, y_pred, None)
    411         if y_type == 'continuous-multioutput':
    412             warnings.warn("The default value of multioutput (not exposed in "

~\Anaconda3\lib\site-packages\sklearn\metrics\regression.py in _check_reg_targets(y_true, y_pred, multioutput)
     76     """
     77     check_consistent_length(y_true, y_pred)
---> 78     y_true = check_array(y_true, ensure_2d=False)
     79     y_pred = check_array(y_pred, ensure_2d=False)
     80 

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    494             try:
    495                 warnings.simplefilter('error', ComplexWarning)
--> 496                 array = np.asarray(array, dtype=dtype, order=order)
    497             except ComplexWarning:
    498                 raise ValueError("Complex data not supported\n"

~\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
    536 
    537     """
--> 538     return array(a, dtype, copy=False, order=order)
    539 
    540 

ValueError:无法将字符串转换为float:“正”

这是错误发生的代码段:

vectorizer = TfidfVectorizer (max_features=2500, min_df=1, max_df=1, stop_words=stopwords.words('english'))
chat_data = vectorizer.fit_transform(chat_data).toarray()

X_train, X_test, y_train, y_test = train_test_split(chat_data, chat_labels, test_size=0.2, random_state=0)

text_classifier = RandomForestClassifier(n_estimators=200, random_state=0)
text_classifier.fit(X_train, y_train)
predictions = text_classifier.predict(X_test)
X_train = np.array(X_train).reshape((-1,1))
y_train = np.array(y_train).reshape((-1,1))
print(len(X_train))
print(len(y_train))
score = pickle_model.score(X_train, y_train)
print("Test score: {0:.2f} %".format(100 * score))

0 个答案:

没有答案