pd.Categorical.from_codes缺少值

时间:2017-01-21 12:51:14

标签: pandas

假设我有:

df = pd.DataFrame({'gender': np.random.choice([1, 2], 10), 'height': np.random.randint(150, 210, 10)})

我想将性别列分类。如果我尝试:

df['gender'] = pd.Categorical.from_codes(df['gender'], ['female', 'male'])

它会失败。

我可以填写类别

df['gender'] = pd.Categorical.from_codes(df['gender'], ['N/A', 'female', 'male'])

但是在某些方法中返回'N/A'

In [67]: df['gender'].value_counts()
Out[67]: 
female    5
male      5
N/A       0
Name: gender, dtype: int64

我考虑过使用None作为填充值。它在value_counts中按预期工作,但我收到警告:

opt/anaconda3/bin/ipython:1: FutureWarning: 
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
  #!/opt/anaconda3/bin/python

有更好的方法吗?还有一种方法可以明确地提供从代码到类别的映射吗?

3 个答案:

答案 0 :(得分:1)

您可以使用rename_categories()方法:

演示:

In [33]: df
Out[33]:
   gender  height
0       1     203
1       2     169
2       2     181
3       1     172
4       2     174
5       1     166
6       2     187
7       2     200
8       1     208
9       1     201

In [34]: df['gender'] = df['gender'].astype('category').cat.rename_categories(['male','feemale'])

In [35]: df
Out[35]:
    gender  height
0     male     203
1  feemale     169
2  feemale     181
3     male     172
4  feemale     174
5     male     166
6  feemale     187
7  feemale     200
8     male     208
9     male     201

In [36]: df.dtypes
Out[36]:
gender    category
height       int32
dtype: object

答案 1 :(得分:1)

将新类别直接分配给它的.categories属性,然后将其重命名为以下值:

df['gender'] = df['gender'].astype('category')
df['gender'].cat.categories = ['female', 'male']

df['gender'].value_counts()
Out[23]:
female    7
male      3
Name: gender, dtype: int64

df.dtypes
Out[24]:
gender    category
height       int32
dtype: object

如果你想要一个代码映射器dict及其相应的类别,那么:

old = df['gender'].cat.categories
new = ['female', 'male']

dict(zip(old, new))
Out[28]:
{1: 'female', 2: 'male'}

答案 2 :(得分:0)

您从pd.Categorical.from_codes(df['gender'], ['female', 'male'])收到的错误会提醒您codes需要被编入索引。

所以你可以使用你的DataFrame声明来实现它。

df = pd.DataFrame({'gender': np.random.choice([0, 1], 10), 'height': np.random.randint(150, 210, 10)})