Question

我有一个类别列，我想用一个系列填充。我尝试过：

df = pd.DataFrame({'key': ['a', 'b'], 'value': ['c', np.nan]})
df['value'] = df['value'].astype("category")
df['value'] = df['value'].cat.add_categories(df['key'].unique())
print(df['value'].cat.categories)
df['value'] = df['value'].fillna(df['key'])
print(df)

预期输出：

Index(['c', 'a', 'b'], dtype='object')
  key value
0   a     c
1   b     b

实际输出：

Index(['c', 'a', 'b'], dtype='object')
  key value
0   a     a
1   b     b

Answer 1

这似乎是一个错误，但值得庆幸的是，解决方法非常简单。填充时，您必须将“值”视为字符串列。

df['value'] = pd.Categorical(
    df.value.astype(object).fillna(df.key), categories=df.stack().unique())
df

  key value
0   a     c
1   b     b

Answer 2

从doc开始，分类数据将接受标量而非序列，因此您可能需要将其转换回序列

df.value.astype('object').fillna(df.key) # then convert to category again
Out[248]: 
0    c
1    b
Name: value, dtype: object

value：用于填充孔的标量值（例如0）

使用“系列”输入的类别列的fillna不能正常工作

2 个答案: