LabelEncoder TypeError?

时间:2017-08-30 16:27:13

标签: python-3.x scikit-learn label encoder

我正在尝试使用LabelEncoder编码一些文本值。为此我写信:

onehot = pd.DataFrame()
encoders = []
for column in df_resolved.loc[:, ((df_resolved.dtypes != np.int64)&(df_resolved.dtypes != np.int32))]:
    enc = preprocessing.LabelEncoder()
    encoders.append(enc)
    onehot[column] = enc.fit_transform(df_resolved[column])

我需要使用新数据重现编码,我是否需要存储编码器,这就是我这样做的原因。但是,我收到一个错误:

  

TypeError:'>' 'str'和'int'实例之间不支持

我不明白为什么会这样。编码器应该能够根据文档编码字符串。我错过了什么?

完整堆栈跟踪:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-330-f9a564c7c9ab> in <module>()
      8     enc = preprocessing.LabelEncoder()
      9     encoders.append(enc)
---> 10     onehot[column] = enc.fit_transform(df_resolved[column])

/Users/csanadpoda/Documents/Jupyter/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/label.py in fit_transform(self, y)
    129         y = column_or_1d(y, warn=True)
    130         _check_numpy_unicode_bug(y)
--> 131         self.classes_, y = np.unique(y, return_inverse=True)
    132         return y
    133 

/Users/csanadpoda/Documents/Jupyter/anaconda/lib/python3.6/site-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts)
    209 
    210     if optional_indices:
--> 211         perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
    212         aux = ar[perm]
    213     else:

TypeError: '>' not supported between instances of 'str' and 'int'

更新:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1436 entries, 0 to 1706
Data columns (total 26 columns):
u_category                       1436 non-null object
caller_id.country                1436 non-null object
number                           1436 non-null object
priority                         1436 non-null object
urgency                          1436 non-null object
incident_state                   1436 non-null object
u_subcategory                    1436 non-null object
assigned_to                      1436 non-null object
short_description                1436 non-null object
sys_created_on                   1436 non-null datetime64[ns]
business_duration                1436 non-null int64
u_resolved_time                  1436 non-null datetime64[ns]
u_reopen_count                   1436 non-null int64
sys_created_by                   1436 non-null int64
caller_id.u_display_name         1436 non-null object
u_on_behalf_of.u_display_name    1436 non-null object
u_on_behalf_of.email             1436 non-null object
u_actual_time_to_resolve         1436 non-null int64
comments                         1436 non-null object
u_comments_and_work_notes        1436 non-null object
description                      1436 non-null object
impact                           1436 non-null object
u_problem_classification         1436 non-null object
resolution_time                  1436 non-null float64
rawtext                          1436 non-null object
cluster                          1436 non-null int32
dtypes: datetime64[ns](2), float64(1), int32(1), int64(4), object(18)
memory usage: 337.3+ KB

这是df信息。我的SKLearn版本是0.18.1。

0 个答案:

没有答案