Question

我有一个带有数字数据的Pandas系列，我想找到它的独特值以及它们的频率外观。我使用标准程序

# Given the my_data is a column of a pd.Dataframe df
unique = df[my_data].value_counts() 
print unique

以下是我得到的结果

# -------------------OUTPUT
-0.010000    46483 
-0.010000    16895
-0.027497    12215
-0.294492    11915
 0.027497    11397

我不能得到的是为什么我有＆＃34;相同的价值＆＃34; （-0.01）发生两次。这是一个内部门槛（小值）还是我做错了？

更新

如果我将数据帧存储在csv中并再次读取，我会得到正确的结果，即：

# -------------------OUTPUT -0.010000 63378 -0.027497 12215 -0.294492 11915 0.027497 11397 的解决方案

根据讨论，我找到了问题的根源和解决方案。如上所述，它是一个浮点精度，可以通过舍入值来解决。但是，如果没有

，我将无法看到

pd.set_option('display.float_format', repr)

非常感谢你的帮助!!

Answer 1

我认为这是一个类似于下面的浮点精度问题：

In [1]: 0.1 + 0.2
Out[1]: 0.30000000000000004

In [2]: 0.1 + 0.2 == 0.3
Out[2]: False

所以试试这个：

df[my_data].round(6).value_counts()

<强>更新

演示：

In [14]: s = pd.Series([-0.01, -0.01, -0.01000000000123, 0.2])

In [15]: s
Out[15]:
0   -0.01
1   -0.01
2   -0.01
3    0.20
dtype: float64

In [16]: s.value_counts()
Out[16]:
-0.01    2
-0.01    1
 0.20    1
dtype: int64

In [17]: s.round(6).value_counts()
Out[17]:
-0.01    3
 0.20    1
dtype: int64

大熊猫的奇怪行为Series.value_counts（）

1 个答案: