我有一张表格,其中使用HLL图表计算每日的活跃用户。我有很多维度和指标,但是只有在我想这样做时才会真正出现问题:
SELECT
pet_type,
SUM(number_of_pets_owned) as total_pets,
SUM(number_of_pets_owned)/HLL_COUNT.MERGE(population) as pets_per_person,
FROM
pet_database
GROUP BY
partitiontime,
pet_type
执行此操作时的问题是每个pets_pet_person始终> 1,因为草图按pet_type分组。我真正想做的是使用诸如解析函数OVER (PARTITION BY partitiontime)
之类的东西。像这样:
SELECT
pet_type,
SUM(number_of_pets_owned) as total_pets,
SUM(number_of_pets_owned)/HLL_COUNT.MERGE(population) OVER (PARTITION BY partitiontime) as pets_per_person,
FROM
pet_database
GROUP BY
partitiontime,
pet_type
...了解人口中常见的宠物。但这是无效的语法,因为聚合分析函数不支持HLL。
我错误地解决了这个问题,还是我想念一个简单的解决方案?
答案 0 :(得分:0)
如果您用总数代替JOIN怎么办?
喜欢
WITH sample_table AS (
SELECT wiki, HLL_COUNT.INIT(title) sketch
FROM `fh-bigquery.wikipedia_v3.pageviews_2019`
WHERE DATE(datehour) = "2019-01-27"
AND wiki LIKE 'a%'
GROUP BY wiki
)
SELECT wiki, HLL_COUNT.MERGE(sketch) count,
FORMAT('%.2f%%', 100* HLL_COUNT.MERGE(sketch)
/ (SELECT HLL_COUNT.MERGE(sketch) FROM sample_table)
) percent
FROM sample_table
GROUP BY wiki
ORDER BY count DESC
LIMIT 1000