我需要从已知样本数量的相对频率重建绝对频率。
这应该很容易,但是绝对频率和样本大小为numpy.int64
,相对频率为numpy.float64
。
我知道浮点十进制值通常没有精确的二进制表示形式,并且我们可能会遇到一些精度损失。似乎是这样,浮点运算产生了意外的结果,我不能相信重建的绝对频率。
复制错误的示例代码:
import pandas as pd
import numpy as np
absolutes = np.arange(100000, dtype=np.int64) #numpy.int64
sample_size = absolutes.sum() # numpy.int64
relatives = absolutes / sample_size #float64
# Rebuilding absolutes from relatives
rebuilt_float = relatives * sample_size #float64
rebuilt_int = rebuilt_float.astype(np.int64)
df = pd.DataFrame({'absolutes': absolutes,
'relatives': relatives,
'rebuilt_float': rebuilt_float,
'rebuilt_int': rebuilt_int})
df['check_float'] = df['absolutes'] == df['rebuilt_float']
df['check_int'] = df['absolutes'] == df['rebuilt_int']
print('Failed FLOATS: ', len(df[df['check_float'] == False]))
print('Failed INTS:', len(df[df['check_int'] == False]))
print('Sum of FLOATS:', df['rebuilt_float'].sum())
print('Sum of INTS:', df['rebuilt_int'].sum())
是否可以使用numpy解决问题而不将每个数字都转换为小数?
答案 0 :(得分:1)
np.isclose(df['absolutes'], df['rebuilt_float'], atol=.99999)
numpy.isclose()
是不精确的可感知fp的比较。它有一个额外的参数atol
和rtol
用于相对和绝对公差。
您可以通过更改atol
来消除多少舍入错误:
>>> len(np.where( np.isclose(df['absolutes'], df['rebuilt_int'], atol=.99999) == False )[0])
0
>>> len(np.where( np.isclose(df['absolutes'], df['rebuilt_int'], atol=.5) == False )[0])
2767
>>> len(np.where( np.isclose(df['absolutes'], df['rebuilt_int'], atol=1) == False )[0])
0
答案 1 :(得分:0)
如果在将转换后的值转换为整数之前四舍五入,则会得到零个失败的整数。也就是说,使用
rebuilt_int = np.round(rebuilt_float).astype(np.int64)
然后输出
Failed FLOATS: 11062
Failed INTS: 0
Sum of FLOATS: 4999950000.0
Sum of INTS: 4999950000