使用HLL_COUNT.MERGE作为分析函数

时间:2019-01-28 15:58:49

标签: google-bigquery

我有一张表格,其中使用HLL图表计算每日的活跃用户。我有很多维度和指标,但是只有在我想这样做时才会真正出现问题:

SELECT 
    pet_type,
    SUM(number_of_pets_owned) as total_pets,
    SUM(number_of_pets_owned)/HLL_COUNT.MERGE(population) as pets_per_person,
FROM
    pet_database
GROUP BY
    partitiontime,
    pet_type

执行此操作时的问题是每个pets_pet_person始终> 1,因为草图按pet_type分组。我真正想做的是使用诸如解析函数OVER (PARTITION BY partitiontime)之类的东西。像这样:

SELECT 
    pet_type,
    SUM(number_of_pets_owned) as total_pets,
    SUM(number_of_pets_owned)/HLL_COUNT.MERGE(population) OVER (PARTITION BY partitiontime) as pets_per_person,
FROM
    pet_database
GROUP BY
    partitiontime,
    pet_type

...了解人口中常见的宠物。但这是无效的语法,因为聚合分析函数不支持HLL。

我错误地解决了这个问题,还是我想念一个简单的解决方案?

1 个答案:

答案 0 :(得分:0)

如果您用总数代替JOIN怎么办?

喜欢

WITH sample_table AS (
  SELECT wiki, HLL_COUNT.INIT(title) sketch
  FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
  WHERE DATE(datehour) = "2019-01-27"
  AND wiki LIKE 'a%'
  GROUP BY wiki
)

SELECT wiki, HLL_COUNT.MERGE(sketch) count,
  FORMAT('%.2f%%', 100* HLL_COUNT.MERGE(sketch)
    / (SELECT HLL_COUNT.MERGE(sketch) FROM sample_table)
  ) percent
FROM sample_table 
GROUP BY wiki
ORDER BY count DESC
LIMIT 1000

enter image description here