Question

StringIndexer将标签的字符串列编码为标签索引列。

id | category | categoryIndex
----|----------|---------------
 0  | a        | 0.0
 1  | b        | 2.0
 2  | c        | 1.0
 3  | a        | 0.0
 4  | a        | 0.0
 5  | c        | 1.0

如何在python中实现这一点而不使用pyspark.ml.feature StringIndexer？

Answer 1

由于您提到pandas，请尝试使用ngroup

df.groupby('category').ngroup()
Out[564]: 
0    0
1    1
2    2
3    0
4    0
5    2
dtype: int64

使用pandas，numpy在python中是否有替代pyspark.ml.feature StringIndexer？

1 个答案: