假设我有:
df = pd.DataFrame({'gender': np.random.choice([1, 2], 10), 'height': np.random.randint(150, 210, 10)})
我想将性别列分类。如果我尝试:
df['gender'] = pd.Categorical.from_codes(df['gender'], ['female', 'male'])
它会失败。
我可以填写类别
df['gender'] = pd.Categorical.from_codes(df['gender'], ['N/A', 'female', 'male'])
但是在某些方法中返回'N/A'
:
In [67]: df['gender'].value_counts()
Out[67]:
female 5
male 5
N/A 0
Name: gender, dtype: int64
我考虑过使用None
作为填充值。它在value_counts
中按预期工作,但我收到警告:
opt/anaconda3/bin/ipython:1: FutureWarning:
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
#!/opt/anaconda3/bin/python
有更好的方法吗?还有一种方法可以明确地提供从代码到类别的映射吗?
答案 0 :(得分:1)
您可以使用rename_categories()方法:
演示:
In [33]: df
Out[33]:
gender height
0 1 203
1 2 169
2 2 181
3 1 172
4 2 174
5 1 166
6 2 187
7 2 200
8 1 208
9 1 201
In [34]: df['gender'] = df['gender'].astype('category').cat.rename_categories(['male','feemale'])
In [35]: df
Out[35]:
gender height
0 male 203
1 feemale 169
2 feemale 181
3 male 172
4 feemale 174
5 male 166
6 feemale 187
7 feemale 200
8 male 208
9 male 201
In [36]: df.dtypes
Out[36]:
gender category
height int32
dtype: object
答案 1 :(得分:1)
将新类别直接分配给它的.categories
属性,然后将其重命名为以下值:
df['gender'] = df['gender'].astype('category')
df['gender'].cat.categories = ['female', 'male']
df['gender'].value_counts()
Out[23]:
female 7
male 3
Name: gender, dtype: int64
df.dtypes
Out[24]:
gender category
height int32
dtype: object
如果你想要一个代码映射器dict
及其相应的类别,那么:
old = df['gender'].cat.categories
new = ['female', 'male']
dict(zip(old, new))
Out[28]:
{1: 'female', 2: 'male'}
答案 2 :(得分:0)
您从pd.Categorical.from_codes(df['gender'], ['female', 'male'])
收到的错误会提醒您codes
需要被编入索引。
所以你可以使用你的DataFrame声明来实现它。
df = pd.DataFrame({'gender': np.random.choice([0, 1], 10), 'height': np.random.randint(150, 210, 10)})