我对BigQuery有一个真正的问题并处理财务计算。它似乎取决于查询的便士,这不符合我的需要。这是一个例子。考虑这个非常简单的数据集:
TOT_AMT,DLY_AMT,SUN_AMT,CREDIT_COPIES,UNIT_COST_DLY,UNIT_COST_SUNDAY,DAILY_COPIES,SUNDAY_COPIES 81.91,16.58,65.33,15,1.105,4.355,1,1 10.67,0.0,10.67,3,1.245,3.555,0,1 24.74,8.16,16.58,3,1.36,5.525,2,1 38.03,0.0,38.03,9,0.0,4.225,0,1
现在有人会做基本的舍入和测试并不会太困难,但不幸的是它确实如此。我尝试了各种方法,如下所述。我只能得到一个查询。这是一个要运行的查询:
SELECT
TOT_AMT,
ROUND(ROUND(UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES *10000)/10000 + ROUND(UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES * 10000)/10000,2) AS TOT_AMT_calc1,
ROUND( ( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES * 100 ) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES * 100 ) )/100, 2) AS TOT_AMT_calc2,
ROUND( ( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES ) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES ) ), 2) AS TOT_AMT_calc_FULL,
ROUND( ( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES *100) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES *100 ) ), 2) AS TOT_AMT_calc_FULL2,
( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES ) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES ) ) AS TOT_AMT_calc_FULL_NOROUND,
ROUND(ROUND( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES *1000) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES *1000) )/1000,2) AS TOT_AMT_calc_thousand,
ROUND(ROUND( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES *100) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES *100) )/100,2) AS TOT_AMT_calc_works
FROM
`my_project.my_table`
在这个数据集中发现了这个输出:
TOT_AMT,TOT_AMT_calc1,TOT_AMT_calc2,TOT_AMT_calc_FULL,TOT_AMT_calc_FULL2,TOT_AMT_calc_FULL_NOROUND,TOT_AMT_calc_thousand,TOT_AMT_calc_works 81.91,81.9,81.9,81.9,8190.0,81.9,81.9,81.9 10.67,10.66,10.66,10.67,1066.5,10.665000000000001,10.66,10.67 24.74,24.73,24.73,24.74,2473.5,24.735000000000003,24.73,24.74 38.03,38.02,38.02,38.02,3802.5,38.025,38.02,38.03
正如您所看到的,正确获得四舍五入的唯一方法是使用此功能:
ROUND(ROUND( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES *100) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES *100) )/100,2)
我需要一些能够完全发挥作用的东西才能确保完美的财务计算。在BigQuery问题跟踪器的这个问题中,我可以概述我对时间超过100(10,000)的主要愿望: https://issuetracker.google.com/issues/35906014
唉,这也不起作用,圆形不再起作用。
非常感谢任何其他见解。我需要一种可重复且准确的方法来计算真实的财务状况,即使是小数字,BigQuery轮次也会失败。 UDF会更好吗?
*更新* 通过转换和导出表作为整数进行了一些额外的测试。首先基本上乘以100,然后第二次乘以10000.看来BigQuery将Integer存储为int64,它具有类似的舍入问题。使用一个表,其中所有整数乘以10,000,我获得准确结果的唯一方法基本上与float64中的方法相同。即浮点值舍入乘以100。
SELECT
TOT_AMT,
ROUND((( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES) )/10000 ),2) AS TOT_AMT_calc_fail1,
((( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES) )/10000 )) AS TOT_AMT_calc_fail2,
(ROUND(( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES) )/100 )/100) AS TOT_AMT_calc_works3
FROM
`my_project.my_table`
我甚至尝试重新铸造没有任何影响。回流到浮点似乎没有做任何事情,因为看起来INT64的反应与FLOAT64相同。
答案 0 :(得分:1)
在与Google团队互动并查找浮动中的四舍五入错误后,我找到了一个非常准确的解决方案。它允许我在BigQuery中将我的值存储为float,但是
SELECT
TOT_AMT,
ROUND(ROUND(UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES *10000)/10000 + ROUND(UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES * 10000)/10000,2) AS TOT_AMT_calc1,
ROUND( ( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES * 100 ) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES * 100 ) )/100, 2) AS TOT_AMT_calc2,
ROUND( ( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES ) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES ) ), 2) AS TOT_AMT_calc_FULL,
ROUND( ( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES *100) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES *100 ) ), 2) AS TOT_AMT_calc_FULL2,
( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES ) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES ) ) AS TOT_AMT_calc_FULL_NOROUND,
ROUND(ROUND( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES *1000) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES *1000) )/1000,2) AS TOT_AMT_calc_thousand,
ROUND(ROUND( ( UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES *100) + ( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES *100) )/100,2) AS TOT_AMT_calc_mostly_works,
ROUND( ( FLOOR(UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES *1000000000) + FLOOR( UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES *1000000000) ) / 1000000000+.005,2) AS TOT_AMT_calc_WORKS
FROM
`my_project.table`
重要的一句就在这里:
ROUND((FLOOR(UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES) * 1000000000)+ FLOOR(UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES * 1000000000))/ 1000000000 + .005,2)AS TOT_AMT_calc_WORKS
这会消除浮点错误,将浮点数转换为纳秒,每次乘法后删除所有垃圾,并给出准确的结果。
如果我的结果需要3个小数位,我将按如下方式更改此行:
ROUND((FLOOR(UNIT_COST_DLY * DAILY_COPIES * CREDIT_COPIES) * 1000000000)+ FLOOR(UNIT_COST_SUNDAY * SUNDAY_COPIES * CREDIT_COPIES * 1000000000))/ 1000000000 + .0005,4)AS TOT_AMT_calc_WORKS
此方法允许我将所有值存储在BigQuery中作为float,但在运行时执行操作,直到Google添加小数格式。 :)