Sklearn:将数据拟合到逻辑回归模型时出现类型错误

时间:2021-05-22 12:00:15

标签: python scikit-learn sklearn-pandas

在使用逻辑回归进行 fit_transform 时出现以下错误

from sklearn.feature_extraction.text import TfidfVectorizer

tfidf_vectorizer = TfidfVectorizer()

X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)

X_train_tfidf.shape

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(solver = 'lbfgs')

clf.fit(X_train_tfidf,y_train)

我浏览了此线程 LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str' 但这也无济于事。任何帮助将不胜感激

TypeError: '<' not supported between instances of 'float' and 'str'

根据上面的链接,我也没有任何空值..

X_train.isnull().value_counts()

False    2584
Name: Headline, dtype: int64




---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-65-e676010d2b44> in <module>
      3 clf = LogisticRegression(solver = 'lbfgs')
      4 
----> 5 clf.fit(X_train_tfidf,y_train)

~/Desktop/Anaconda/anaconda3/envs/nlp_course/lib/python3.7/site-packages/sklearn/linear_model/logistic.py in fit(self, X, y, sample_weight)
   1284         X, y = check_X_y(X, y, accept_sparse='csr', dtype=_dtype, order="C",
   1285                          accept_large_sparse=solver != 'liblinear')
-> 1286         check_classification_targets(y)
   1287         self.classes_ = np.unique(y)
   1288         n_samples, n_features = X.shape

~/Desktop/Anaconda/anaconda3/envs/nlp_course/lib/python3.7/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    166     y : array-like
    167     """
--> 168     y_type = type_of_target(y)
    169     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    170                       'multilabel-indicator', 'multilabel-sequences']:

~/Desktop/Anaconda/anaconda3/envs/nlp_course/lib/python3.7/site-packages/sklearn/utils/multiclass.py in type_of_target(y)
    285         return 'continuous' + suffix
    286 
--> 287     if (len(np.unique(y)) > 2) or (y.ndim >= 2 and len(y[0]) > 1):
    288         return 'multiclass' + suffix  # [1, 2, 3] or [[1., 2., 3]] or [[1, 2]]
    289     else:

~/Desktop/Anaconda/anaconda3/envs/nlp_course/lib/python3.7/site-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
    231     ar = np.asanyarray(ar)
    232     if axis is None:
--> 233         ret = _unique1d(ar, return_index, return_inverse, return_counts)
    234         return _unpack_tuple(ret)
    235 

~/Desktop/Anaconda/anaconda3/envs/nlp_course/lib/python3.7/site-packages/numpy/lib/arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
    279         aux = ar[perm]
    280     else:
--> 281         ar.sort()
    282         aux = ar
    283     mask = np.empty(aux.shape, dtype=np.bool_)

TypeError: '<' not supported between instances of 'float' and 'str'

0 个答案:

没有答案