Question

我在下面的pandas数据框df：

|      clm1  |      clm2|
|     79.02  |     80.98|
|     78.55  |     81.47|
|     98.99  |    101.01|
|    999.54  |    999.55|
|    999.55  |    999.55|

我正在对其进行以下计算：

df['avg'] = (df['clm1']+df['clm2'])/2

print(df)

| clm1   |    clm2   |  avg   |
|79.02   |    80.98  | 80.000 |
|78.55   |    81.47  | 80.010 |
|98.99   |   101.01  |100.000 |
|99.54   |   999.55  |999.545 |
|99.55   |   999.55  |999.550 |

当我将上述数据帧写入csv时，得到的结果不正确。

df.to_csv(myfile.csv)

clm1  , clm2  , avg
79.02 , 80.98 , 80.0
78.55 , 81.47 , 80.00999999999999  *# This should be 80.01*
98.99 , 101.01, 100.0
999.54, 999.55, 999.545
999.55, 999.55, 999.55

我了解浮点问题，并且已经通过以下解答：

Python float - str - float weirdness Is floating point math broken? 这些建议使用十进制而不是浮点数。但是我找不到解决方法。注意：我不想使用任何舍入方法。我需要确切的结果。

Answer 1

这里是强制转换为int以避免舍入的选项。当取两列的平均值时，此方法有效。

#recreate data
import pandas as pd

df = pd.DataFrame([[79.02,80.98],
                   [78.55,81.47],
                   [98.99,101.01],
                   [999.54,999.55],
                   [999.55,999.55]], columns = ['clm1','clm2'])

#cast all values to integers
df = df.astype(int)
df['avg'] = ((df['clm1']+df['clm2'])/2).astype(int)

#return to floating point
df = (df/1000)
df.to_csv('pandasfile.csv')

上面的输出是：

,clm1,clm2,avg
0,79.02,80.98,80.0
1,78.55,81.47,80.01
2,98.99,101.01,100.0
3,999.54,999.55,999.545
4,999.55,999.55,999.55

另一种选择： 将Decimal类与pandas一起使用是另一种选择，但如果必须从数据帧中将大量浮点数转换为Decimals，则这将很繁琐且缓慢。假设您将所有内容导入为Decimal，则过程如下。

从十进制导入十进制

df = pd.DataFrame([[Decimal("79.02"),Decimal("80.98")],
                   [Decimal("78.55"),Decimal("81.47")],
                   [Decimal("98.99"),Decimal("101.01")],
                   [Decimal("999.54"),Decimal("999.55")],
                   [Decimal("999.55"),Decimal("999.55")]], columns = ['clm1','clm2'])

df['avg'] = (df['clm1']+df['clm2'])/2
df.to_csv('pandasfile.csv')

这在csv文件中提供了以下内容：

,clm1,clm2,avg
0,79.02,80.98,80.00
1,78.55,81.47,80.01
2,98.99,101.01,100.00
3,999.54,999.55,999.545
4,999.55,999.55,999.55

原始答案： 您可以在to_csv方法上使用float_format参数。

df['avg'] = (df['clm1']+df['clm2'])/2

使用float_format指定小数位数：

df.to_csv('pandasfile.csv', float_format='%.3f')

这会将以下内容写入csv文件

,clm1,clm2,avg
0,79.020,80.980,80.000
1,78.550,81.470,80.010
2,98.990,101.010,100.000
3,999.540,999.550,999.545
4,999.550,999.550,999.550

Answer 2

这是使用Decimal类的一个小示例（尽管不适用于熊猫）：

from decimal import Decimal

xs = [Decimal("79.02"), Decimal("78.55"), Decimal("98.99"),
     Decimal("999.54"), Decimal("999.55")]

ys = [Decimal("80.98"), Decimal("81.47"), Decimal("101.01"), 
      Decimal("999.55"), Decimal("999.55")]

# conversion with str() is to align columns
for x, y in zip(xs, ys):
    print(f'{str(x):>8s} {str(y):>8s} {str((x + y) / 2):>8s}')

   79.02    80.98    80.00
   78.55    81.47    80.01
   98.99   101.01   100.00
  999.54   999.55  999.545
  999.55   999.55   999.55

Python内置的decimal包具有多个取整选项； docs here
“每位计算机科学家都应了解的浮点运算法则”对IEEE浮点标准进行了可访问的概述 here

Answer 3

我找到了解决方法。

首先将列转换为字符串，然后转换为十进制。一切正常，我得到了正确的结果而没有四舍五入。

def getAvg(x,y):    
    return ((x.apply(Decimal)+y.apply(Decimal))/Decimal(2)).apply(Decimal)

df['avg'] = getAvg(df['clm1'].astype('str'),df['clm2'].astype('str'))

熊猫计算给出不正确的小数点

3 个答案: