Question

在定义自己的数据类型后，pandas.round()无法正常工作的问题。我正在使用0.24.2版。

假设我的数据为float64，并且我希望数据为float32，以节省一些内存，并且我想进行一些取整：

import pandas as pd

my_dtypes = {'val': 'float32'}
my_decimals = {'val': 4}

df = pd.DataFrame({'val': [0.14579999446868896]}) # <- this will be 'float64' 
df_mydtypes = df.astype(my_dtypes)

df_rounded = df.round(my_decimals)
df_mydtypes_rounded = df_mydtypes.round(my_decimals)

在四舍五入到小数点后，可能会期望输出为0.1458。

print(df_rounded['val'])
print(df_mydtypes_rounded['val'])

print(df_rounded['val'].item())
print(df_mydtypes_rounded['val'].item())

表面上看起来不错，但是如果我们仔细观察（如我的单元测试所做的那样），则值会不同：

0    0.1458
Name: val, dtype: float64
0    0.1458
Name: val, dtype: float32
0.1458
0.14579999446868896

这是怎么回事？

Answer 1

我认为这与计算机科学中的一个更普遍的问题有关，并且与浮点数的存储方式有关。有关详细说明，请参见Python文档中的"Floating Point Arithmetic: Issues and Limitations"。

一些解决方法：

我注意到.values或.iloc 确实产生正确的数字，但to_list()和.item()却没有。我猜想这与pandas处理和产生基础numpy数组的方式有关。
Python还有一个decimal模块，以防您需要'people'浮动而不是计算机的浮动...

指定数据类型时，熊猫舍入不起作用吗？

1 个答案: