Question

我正在尝试使用以下代码填充缺失值（NAN）

NAN_SUBSTITUTION_VALUE = 1
g = g.fillna(NAN_SUBSTITUTION_VALUE)

但是我收到以下错误

ValueError: fill value must be in categories.

有人请说明这个错误。

Answer 1

在填写之前添加类别：

g = g.cat.add_categories([1])
g.fillna(1)

Answer 2

您的问题遗漏了<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/select2/4.0.3/css/select2.min.css"> <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/select2/4.0.3/js/select2.full.min.js"></script> <select name="name"> <option disabled>Vasia</option> <option>Petia</option> <option>Stas</option> </select>的重点，特别是它有d g。我认为它是这样的：

categorical

您遇到的问题是g = pd.Series(["A", "B", "C", np.nan], dtype="category")需要一个已作为类别存在的值。例如，fillna可行，但g.fillna("A")失败。要使用新值填充系列，您可以执行以下操作：

g.fillna("D")

Answer 3

创建分类数据后，您只能在类别中插入值。

>>> df
    ID  value
0    0     20
1    1     43
2    2     45

>>> df["cat"] = df["value"].astype("category")
>>> df
    ID  value    cat
0    0     20     20
1    1     43     43
2    2     45     45

>>> df.loc[1, "cat"] = np.nan
>>> df
    ID  value    cat
0    0     20     20
1    1     43    NaN
2    2     45     45

>>> df.fillna(1)
ValueError: fill value must be in categories
>>> df.fillna(43)
    ID  value    cat
0    0     20     20
1    1     43     43
2    2     45     45

Answer 4

有时您可能想用数据集中存在的值替换NaN，然后可以使用它：

#creates a random permuation of the categorical values
permutation = np.random.permutation(df[field])

#erase the empty values
empty_is = np.where(permutation == "")
permutation = np.delete(permutation, empty_is)

#replace all empty values of the dataframe[field]
end = len(permutation)
df[field] = df[field].apply(lambda x: permutation[np.random.randint(end)] if pd.isnull(x) else x)

它非常有效。

Answer 5

正如许多人之前所说，此错误源于该功能的类型为“类别”的事实。
我建议先将其转换为字符串，然后使用fillna，最后根据需要将其转换回类别。

g = g.astype('string')
g = g.fillna(NAN_SUBSTITUTION_VALUE)
g = g.astype('category')

熊猫 - 在分类数据中填写NaN

5 个答案: