我有一组从0
到9
的标签,例如:
2 7 5 3
我想将其转换为单热编码,如下所示:
0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0 0
所以我做了这个方法:
def make_one_hot(m):
result = pd.DataFrame([])
for i in range(0, len(m)):
x = [0] * 10
x[m[i]] = 1
result = result.append(x)
print("result: " + result)
return result
打印结果时,出现此错误:
Traceback (most recent call last):
File "../src/script.py", line 23, in <module>
train_labels = make_one_hot(train_data.ix[:,0])
File "../src/script.py", line 18, in make_one_hot
print("result: " + result)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/ops.py", line 1241, in f
8.8s
2
return self._combine_const(other, na_op)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py", line 3641, in _combine_const
raise_on_error=raise_on_error)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py", line 3197, in eval
return self.apply('eval', **kwargs)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py", line 3091, in apply
applied = getattr(b, f)(**kwargs)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py", line 1205, in eval
8.8s
3
repr(other))
TypeError: Could not compare ['result: '] with block values
由于我不熟悉Python,我不确定是否只有print语句错误或者我计算数组的方式也是错误的。
那么这样做的简单而正确的方法是什么?
答案 0 :(得分:4)
方法#1:这是NumPy broadcasting
的一种方法 -
In [143]: a = [2 ,7 ,5 ,3]
In [144]: pd.DataFrame((np.asarray(a)[:,None] == np.arange(10)).astype(int))
Out[144]:
0 1 2 3 4 5 6 7 8 9
0 0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0
2 0 0 0 0 0 1 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0
方法#2:另一位zeros-initialization
-
In [145]: out = np.zeros((len(a), 10),dtype=int)
In [146]: out[np.arange(len(a)), a] = 1
In [147]: pd.DataFrame(out)
Out[147]:
0 1 2 3 4 5 6 7 8 9
0 0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0
2 0 0 0 0 0 1 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0
方法#3:使用Scipy的稀疏矩阵 -
In [166]: from scipy.sparse import csr_matrix
In [167]: n = len(a)
In [169]: pd.DataFrame(csr_matrix(([1]*n, (range(n), a)), shape=(n, 10)).toarray())
Out[169]:
0 1 2 3 4 5 6 7 8 9
0 0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0
2 0 0 0 0 0 1 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0
答案 1 :(得分:0)
为什么不使用pandas内置功能,pd.get_dummies?
a = [2, 7, 5, 3]
pd.get_dummies(a)
Out:
| 2 | 3 | 5 | 7
---|---|---|---|---
0 | 1 | 0 | 0 | 0
1 | 0 | 0 | 0 | 1
2 | 0 | 0 | 1 | 0
3 | 0 | 1 | 0 | 0