python pandas \ numpy用整数编码唯一

时间:2016-01-14 12:12:56

标签: python pandas integer unique categorical-data

说我有x=["apple","orange","orange","apple","pear"]我希望对整数进行分类表示,例如y=[1,2,2,1,3]。最好的方法是什么?

3 个答案:

答案 0 :(得分:0)

您可以使用:

import pandas as pd

x=["apple","orange","orange","apple","pear"]
s = pd.Series(x)

print s

0     apple
1    orange
2    orange
3     apple
4      pear

print pd.Categorical(s).codes

[0 1 1 0 2]

或者:

import pandas as pd

x=["apple","orange","orange","apple","pear"]

print pd.Categorical(x).codes

#[0 1 1 0 2]

答案 1 :(得分:0)

您可以使用pd.factorize并使用字段0:

In [465]: pd.factorize(x)
Out[465]: (array([0, 1, 1, 0, 2]), array(['apple', 'orange', 'pear'], dtype=object))

In [466]: pd.factorize(x)[0] + 1
Out[466]: array([1, 2, 2, 1, 3])

答案 2 :(得分:-1)

使用熊猫:x.astype('category').cat.codes