如何有效地将布尔表转换为一个热向量?

时间:2019-12-23 16:03:08

标签: python one-hot-encoding

假设我有一张桌子,看起来像这样-

Function GetCombinations(ByVal depth As Integer, ByVal values As String()) As IEnumerable(Of String)
    If depth > values.Count + 1 Then Return New List(Of String)
    Dim result = New List(Of String)

    For i = 0 To depth - 1
        For y = 0 To values.Count - 1
            If i = 0 Then
                result.Add(values(y))
            Else
                result.Add(values(i - 1) + values(y))
            End If
        Next
    Next
    Return result
End Function

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim data_array As String() = {"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
    "15"}
    Dim result = GetCombinations(2, data_array)
    Dim resultx As String = String.Join(",", result)
    TxtListScanTxt.AppendText(resultx)
End Sub

我想将其转换为一个热向量,使得

Movie        Action      Scifi       Drama          Romance
Abc           True       False       False            False
Def           False      False        True            False
Ghi           False      False       False            True

众所周知,只有一列可以为True。

在python中是否有一种有效的方法?

2 个答案:

答案 0 :(得分:0)

您可以使用numpy进行此操作。

import numpy as np

Abc = np.array([True,False,False,False])
Def = np.array([False,False,True,False])
Ghi = np.array([False,False,False,True])
movies = np.array([Abc, Def, Ghi])
print("Input:")
print(movies)

#casting from boolean to integer
result  = np.array(movies, dtype=np.int)

print("Output:")
print(result)

答案 1 :(得分:0)

好的,所以我找到了一种方法来处理更大的数据集。

df['genre'] = pd.Series(np.random.randn(size), index=df.index)
for i in range(len(df)):
    if df.iloc[i]['action'] == True:
        df.at[i, 'genre'] = 0        
    elif df.iloc[i]['scifi'] == True:
        df.at[i, 'genre'] = 1
    elif df.iloc[i]['drama'] == True:
        df.at[i, 'genre'] = 2
    elif df.iloc[i]['romance'] == True:
        df.at[i, 'genre'] = 3

因此,通过执行此操作,我们将在数据框中创建一个名为“ genre”的新列,并为其提供适当的值。之后,

y = df['genre']
import tensorflow as tf
y_categorical = tf.keras.utils.to_categorical(y)

这将完成将其转换为一个热向量的工作。