如何使用LabelBinarizer sklearn将None值的标签数据转换为OneHot

时间:2018-10-09 12:09:05

标签: python pandas scikit-learn one-hot-encoding

我有标签数据,其值相同为np.nan
我想使用LabelBinarizer将数据转换为OneHot向量,而np.nan将转换为零数组。
但是我得到一个错误。我成功使用pandas模型中的get_dummies转换了数据。 我不能使用get_dummies函数,因为训练和测试数据来自不同的文件和不同的时间。我想使用sklearn模型进行保存,而我们使用后面的模型。

例如代码:

In[11]: df = pd.DataFrame({'CITY':['London','NYC','Manchester',np.nan],'Country':['UK','US','UK','AUS']})
In[12]: df
Out[12]: 
         CITY Country
0      London      UK
1         NYC      US
2  Manchester      UK
3         NaN     AUS
In[13]: pd.get_dummies(df['CITY'])
Out[13]: 
   London  Manchester  NYC
0       1           0    0
1       0           0    1
2       0           1    0
3       0           0    0
In[14]: from sklearn.preprocessing import LabelBinarizer
        lb = LabelBinarizer()
In[15]: lb.fit_transform(df['CITY'])

Traceback (most recent call last):
  File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-16-d0afb38b2695>", line 1, in <module>
    lb.fit_transform(df['CITY'])
  File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/sklearn/preprocessing/label.py", line 307, in fit_transform
    return self.fit(y).transform(y)
  File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/sklearn/preprocessing/label.py", line 276, in fit
    self.y_type_ = type_of_target(y)
  File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/sklearn/utils/multiclass.py", line 288, in type_of_target
    if (len(np.unique(y)) > 2) or (y.ndim >= 2 and len(y[0]) > 1):
  File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/numpy/lib/arraysetops.py", line 223, in unique
    return _unique1d(ar, return_index, return_inverse, return_counts)
  File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/numpy/lib/arraysetops.py", line 283, in _unique1d
    ar.sort()
TypeError: unorderable types: float() < str()

0 个答案:

没有答案