Question

我需要在数据集中编码分类功能。我希望它们被订购，因此'low'变为0，'vhigh'变为3。我尝试使用预处理中的标签编码器：

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(['low', 'med', 'high', 'vhigh'])
ar = le.transform(df[df["buying"] == 'low']["buying"])

不幸的是，功能没有被排序：第4行返回1的数组，我想要一个零数组。

我尝试创建另一个将数字映射到我想要的数字的编码器。但似乎没有结果。

other_le = preprocessing.LabelEncoder()
other_le.fit([1, 2, 0, 3])
other_le.transform(ar)

最后一行再次返回。

如何以最短的方式对分类功能下订单？

Answer 1

LabelEncoder将根据Python内置sorted()函数的输出对您的功能进行排序，在这种情况下，它将按字母顺序对它们进行排序。编写自己的函数来标记这些函数并不难以维持您正在寻找的顺序：

def label( array ):
    labels = ['low', 'med', 'high', 'vhigh']
    return map( labels.index, array )

Answer 2

您可以使用pandas中的factorize功能。
它根据序列对值进行编码，即如果低是第一个，则编码为0，媒体为1，依此类推。

import pandas as pd
myli = ['low','medium','high','very_high']
pd.factorize(myli)[0]

# output
array([0, 1, 2, 3])

使用sklearn按给定顺序编码分类功能

2 个答案: