我正在尝试对第二列进行标签编码,但出现错误。我究竟做错了什么? 我可以对第一列进行编码
data.head()
area_type availability location size society total_sqft bath balcony price
0 Super built-up Area 19-Dec Electronic City Phase II 2 BHK Coomee 1056 2.0 1.0 39.07
1 Plot Area Ready To Move Chikka Tirupathi 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 Built-up Area Ready To Move Uttarahalli 3 BHK NaN 1440 2.0 3.0 62.00
3 Super built-up Area Ready To Move Lingadheeranahalli 3 BHK Soiewre 1521 3.0 1.0 95.00
4 Super built-up Area Ready To Move Kothanur 2 BHK NaN 1200 2.0 1.0 51.00
enc = LabelEncoder()
data.iloc[:,2] = enc.fit_transform(data.iloc[:,2])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-53fda4a71b5e> in <module>()
1 enc = LabelEncoder()
----> 2 data.iloc[:,2] = enc.fit_transform(data.iloc[:,2])
~/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/label.py in fit_transform(self, y)
110 """
111 y = column_or_1d(y, warn=True)
--> 112 self.classes_, y = np.unique(y, return_inverse=True)
113 return y
114
~/anaconda3/lib/python3.6/site-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
208 ar = np.asanyarray(ar)
209 if axis is None:
--> 210 return _unique1d(ar, return_index, return_inverse, return_counts)
211 if not (-ar.ndim <= axis < ar.ndim):
212 raise ValueError('Invalid axis kwarg specified for unique')
~/anaconda3/lib/python3.6/site-packages/numpy/lib/arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
272
273 if optional_indices:
--> 274 perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
275 aux = ar[perm]
276 else:
TypeError: '<' not supported between instances of 'float' and 'str'
我想对第二列“位置”进行标签编码,如果我使用data.iloc[:,1] = enc.fit_transform(data.iloc[:,1])
索引,我可以对编码可用性列进行标签,因此
我该如何解决?
答案 0 :(得分:2)
您的列的数据类型是什么?
由于标签编码器无法排序数字(并且np.nan
是浮点数)和字符串而导致错误。
要解决此问题,您可以:
-用空字符串data['col_name'].fillna('',inplace=True)
替换所有nan;
-通过输入data['col_name'] = data['col_name'].astype(str)