scikit-learn的标签编码序数和整数

时间:2019-07-19 19:35:03

标签: python encoding scikit-learn

我有不同种类的列要编码:(a)标称值(字符串),(b)序数值(字符串)和(c)序数值(数字)是从诸如由于没有按字母顺序正确分配值的命令,因此显示“优秀”,“差”等。

知道LabelEncoder的输出是一列整数,我应该LabelEncode组b还是组c还是仅组b?在这里,我提供所有组的示例。

a组(OneHotEncoding)

GarageType: Garage location

       2Types   More than one type of garage
       Attchd   Attached to home
       Basment  Basement Garage
       BuiltIn  Built-In (Garage part of house - typically has room above garage)
       CarPort  Car Port
       Detchd   Detached from home
       NA   No Garage

b组(LabelEncoding)

GarageFinish: Interior finish of the garage

       Fin  Finished
       RFn  Rough Finished  
       Unf  Unfinished
       NA   No Garage

c组(映射)

GarageQual: Garage quality

       Ex   Excellent
       Gd   Good
       TA   Typical/Average
       Fa   Fair
       Po   Poor
       NA   No Garage

我正确获取了C组的值,我的问题是,即使它们是整数,我也必须应用LabelEncoder。

0 个答案:

没有答案