将字符串数据带入一个热向量以进行机器学习

时间:2019-01-10 13:29:20

标签: python-3.x one-hot-encoding

每个人。我有一个包含字符串的列表:

labels = ["Synonym", "Antonym", "Not relevant", "Synonym", "Antonym"]

有3种不同的标签,我想先将它们分别引用到数字1,2和3,然后根据它们构建一个热向量,例如标签3-> 0             0             1个 有一个想法怎么做?

1 个答案:

答案 0 :(得分:1)

一个简单的,无库的解决方案是:

labels = ["Synonym", "Antonym", "Not relevant", "Synonym", "Antonym"]

mapping = {label: i for i, label in enumerate(set(labels))}

one_hot = []
for label in labels:
    entry = [0] * len(mapping)
    entry[mapping[label]] = 1
    one_hot.append(entry)

结果:[[0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 0, 1]]

但是您可能想研究sklearn,特别是sklearn.preprocessing.OneHotEncoder