Google BigQuery Sum返回错误结果

时间:2018-10-22 14:53:13

标签: python pandas google-bigquery

我在公共区块链数据上运行此查询,以获取已燃烧的令牌总数。但是SUM返回的结果要比真实的少得多(运行相同的查询时不求和,并在熊猫中运行求和)。它给出了8306,而熊猫为328608。

log.data-十六进制数字

SELECT
  SUM(SAFE_CAST(log.data as INT64)/POW(10,18))
FROM
  `bigquery-public-data.ethereum_blockchain.logs` AS log
WHERE TRUE
  AND log.address = '0xf53ad2c6851052a81b42133467480961b2321c09'
  AND log.block_timestamp >= '2018-01-01 00:00:01'
  AND log.block_timestamp <= '2018-12-01 00:00:01'
  AND SUBSTR(log.topics[SAFE_OFFSET(0)], 1, 10) IN ('0x42696c68','0xcc16f5db')
我不完全理解为什么会这样。会得到答复的。)

1 个答案:

答案 0 :(得分:3)

问题是log.data中排除了某些SUM值,因为它们不适合INT64的范围,因此SAFE_CAST(log.data AS INT64)返回NULL。例如,0x00000000000000000000000000000000000000000000000080b7978da47c78d2大于INT64的最大9223372036854775807值,该值最大为十六进制的0x7FFFFFFFFFFFFFFF

您可以将log.data的值强制转换为FLOAT64类型,从而产生的结果更接近于使用Pandas看到的结果:

SELECT
  SUM(CAST(log.data as FLOAT64)/POW(10,18))
FROM
  `bigquery-public-data.ethereum_blockchain.logs` AS log
WHERE TRUE
  AND log.address = '0xf53ad2c6851052a81b42133467480961b2321c09'
  AND log.block_timestamp >= '2018-01-01 00:00:01'
  AND log.block_timestamp <= '2018-12-01 00:00:01'
  AND SUBSTR(log.topics[SAFE_OFFSET(0)], 1, 10) IN ('0x42696c68','0xcc16f5db')

这将返回329681.7942642243