Question

我从https://www.kaggle.com/gustavomodelli/forest-fires-in-brazil那里获得了原始数据集

Acre_dataset是原始数据集Acre_dataset

的子数据集

我正在尝试汇总“数字”列中的值，并通过“年”列中的不同值过滤掉，如以下屏幕截图

数据集Acre_dataset包含约300行，并且“数字”列中的值的精度都低于0.001（这意味着我们将没有像1.0001这样的数字，但是我们将有像1.001这样的数字）

屏幕截图中的代码：

Acre_firecount = [0] * len(year_ls)
print(type(Acre_dataset.iloc[0]['number']))
for i in range(len(Acre_dataset)):
    for j in range(len(year_ls)):
        if Acre_dataset.iloc[i]['year'] == year_ls[j]:
            Acre_firecount[j] += Acre_dataset.iloc[i]['number']
print(Acre_firecount)
type(Acre_firecount[12])

但是我在这个列表中有两个稀有的数字，分别是475.21299999999997和618.4300000000001。

我已经检查了Acre_dataset的“数字”列中单元格中数字的数据类型和列表Acre_firecount（这是结果）中元素的数据类型，它们都是numpy.float64。

为什么会出现这种问题以及如何避免呢？

Answer 1

我在Python Documentation

中找到了解释

它说：

Python仅将十进制近似值打印到计算机存储的二进制近似值的真实十进制值

熊猫在单元格中添加值时未返回正确的数字

1 个答案: