Question

我正在尝试使用scikit-learn的LabelBinarizer处理pandas DataFrame的分类字段。

这样做时出现错误

“ TypeError：不可排序的类型：float（）

您可以看到train_data['embarked']下面是一个分类字段，它仅包含3个值。但是当我使用LabelBinarizer时，出现了上述错误

train_data['embarked'].head()

train_data['embarked'].value_counts()

from sklearn.preprocessing import LabelBinarizer
labelbinarizer = LabelBinarizer()
lb_result = labelbinarizer.fit_transform(train_data["embarked"])

前两行的输出如下。

0    S
1    C
2    S
3    S
4    S

Name: embarked, dtype: object

S    644
C    168
Q     77
Name: embarked, dtype: int64

导致错误的最后一行。整个错误消息如下所示。

Traceback (most recent call last):
  File "<pyshell#20>", line 1, in <module>
    lb_result = labelbinarizer.fit_transform(train_data["embarked"])
  File "/usr/local/lib/python3.5/dist-packages/sklearn/preprocessing/label.py", line 307, in fit_transform
    return self.fit(y).transform(y)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/preprocessing/label.py", line 276, in fit
    self.y_type_ = type_of_target(y)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/multiclass.py", line 284, in type_of_target
    if (len(np.unique(y)) > 2) or (y.ndim >= 2 and len(y[0]) > 1):
  File "/usr/local/lib/python3.5/dist-packages/numpy/lib/arraysetops.py", line 264, in unique
    ret = _unique1d(ar, return_index, return_inverse, return_counts)
  File "/usr/local/lib/python3.5/dist-packages/numpy/lib/arraysetops.py", line 312, in _unique1d
    ar.sort()
TypeError: unorderable types: float() < str()

我无法理解的这段代码是什么问题？

Answer 1

使用std::string regexPunc = "[\\p{P}]"; // matches any punctuations; re2::RE2 re2Punc(regexPunc); std::string sampleString = "test...test"; StringPiece input(sampleString); int numberOfMatches = 0; while(re2::RE2::FindAndConsume(&input, re2Punc)) { ++numberOfMatches; }

astype('str')

TypeError：不可排序的类型：使用LabelBinarizer的fit_transform时，float（）<str（）

1 个答案: