Question

这是sklearn的OneHotEncoder的问题。使用数组a = [1,2,3,4,5,6,7,8,9,22]，即a.shape=[10,1]的所有唯一（在reshape(-1,1)之后，返回一个[10,10] OneHotEncoded值矩阵。

array([[ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
   [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.]])

但是使用类似a = [1,2,2,4,4,6,7,8,9,22]的数组，即a.shape=[10,1]的非唯一（在reshape(-1,1)之后，将返回[10,8] OneHotEncoded值的矩阵。

array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])

但我不能使用它，因为我的输入占位符需要一个[10,10]矩阵作为输入。任何人都可以帮我处理sklearn的OneHotEncoder中的非唯一值吗？

P.S添加参数n_values = 10会出现错误ValueError: Feature out of bounds for n_values=10

Answer 1

您是否知道您的分类功能可以采用的所有值？如果是这样，你可以这样做：

enc = OneHotEncoder()   
enc.fit(np.asarray([1,2,3,4,5,6,7,8,9,22]).reshape(-1, 1)) #fit your encoder to the values
data_for_encoding =  np.asarray([1,2,2,4,4,6,7,8,9,22]).reshape(-1, 1) #your data
sparse_matrix = enc.transform(data_for_encoding) #encoded data

Python sklearn OneHotEncoding分类和有时重复的值

1 个答案: