Question

假设＆＃39; df＆＃39;是数据框对象，＆＃39; ca＆＃39;是其中一个变数。

>>> df.ca.value_counts()
0.0    176
1.0     65
2.0     38
3.0     20
?        4
Name: ca, dtype: int64

如您所见，我有四个缺失值。我想填写它们。使用以下代码：

>>> df.loc[df.ca == '?', 'ca'] = 0.0
0.0    176
1.0     65
2.0     38
3.0     20
0.0      4
Name: ca, dtype: int64

为什么我有5个独特的值？我想将第五行合并到第一行，即

0.0   176 + 4 = 180
1.0     65
2.0     38
3.0     20

我该如何解决？

Answer 1

由于'?'是您的一个值，我知道df.ca是dtype object或string。当您replace('?', 0.)时，您现在拥有字符串'0.0'和浮动0.0。将all转换为float后，您应该没有问题。

df.ca.replace('?', 0.).astype(float).value_counts()

0.0    180
1.0     65
2.0     38
3.0     20
dtype: int64

Answer 2

以下内容也很有用：

In [193]: df = pd.DataFrame({'ca': [0.0]*176 + [1.0]*65 + [2.0]*38 + [3.0]*20 + ['?']*4})

In [194]: df.ca.value_counts()
Out[194]: 
0.0    176
1.0     65
2.0     38
3.0     20
?        4
Name: ca, dtype: int64

In [195]: df.loc[df.ca == '?', 'ca'] = 0.0

In [196]: df.ca.value_counts()
Out[196]: 
0.0    180
1.0     65
2.0     38
3.0     20
Name: ca, dtype: int64

在Pandas Dataframe中填写缺失值是错误的

2 个答案: