Question

我有一个 BigQuery 表，它看起来像这样并且无法修改：

<头>

国家	客户	连接数	购买次数	国家/地区指标 1	国家/地区指标 2
巴西	A	10	1	3	1000
巴西	B	90	5	3	1000
巴西	C	80	2	3	1000
纳米比亚	B	20	1	5	2000
纳米比亚	C	150	2	5	2000

关于此表，请注意：

Country-Customer 的每个组合都是独一无二的。
顾名思义，国家/地区指标仅取决于国家/地区。
对于某些国家/地区，某些指标不可用（表中的 NULL）。
对于某些国家/地区-客户组合，连接/购买数量不可用

我想在同一个查询中获得以下信息：

Country Metric 1 的平均值仅考虑至少购买次数大于或等于 2 的 Country-Customer 组合。在示例表中，有 3 种组合：Brazil- B、巴西-C 和纳米比亚-C。平均值应只考虑巴西一次，因此结果为 (3 + 5) / 2 = 4。
Country Metric 2 的平均值仅考虑了连接数至少高于 100 的 Country-Customer 组合。示例表中只有一个组合满足此条件：纳米比亚-C.因此，预期结果是 2000。

这些只是示例，但可以有更多指标和其他聚合（总和、最小值、最大值、计数...），但它们应该非常相似。

这是我尝试过的：

SELECT AVG(IF(purchases > 2, country_metric_1, NULL)), -- => 6.5
AVG(IF(connections > 100, country_metric_2, NULL)) -- => 2000
FROM table

问题：如果同一国家/地区出现在多个组合中，则会多次考虑同一指标。

SELECT AVG(IF(purchases > 2, country_metric_1_p, NULL)), -- => random
AVG(IF(connections > 100, country_metric_2_p, NULL)) -- => random
FROM (SELECT purchases, 
connections,
IF(ROW_NUMBER() OVER (PARTITION BY country) = 1, country_metric_1, NULL) country_metric_1_p
IF(ROW_NUMBER() OVER (PARTITION BY country) = 1, country_metric_2, NULL) country_metric_2_p
FROM table)

问题：对于每个国家/地区，只考虑一种组合，给出较低且随机的结果...

有没有办法做到这一点？

Answer 1

使用此查询：

SELECT MAX(CASE WHEN purchases >= 2 THEN country_metric_1 END) country_metric_1,
       MAX(CASE WHEN connections > 100 THEN country_metric_2 END) country_metric_2
FROM tablename 
WHERE purchases >= 2 OR connections > 100
GROUP BY country

您可以为每个国家/地区一次性获得所需的指标。
使用上面的查询作为 CTE 并平均其结果：

WITH cte AS (
    SELECT MAX(CASE WHEN purchases >= 2 THEN country_metric_1 END) country_metric_1,
           MAX(CASE WHEN connections > 100 THEN country_metric_2 END) country_metric_2
    FROM tablename 
    WHERE purchases >= 2 OR connections > 100
    GROUP BY country 
)
SELECT AVG(country_metric_1) avg_country_metric_1,
       AVG(country_metric_2) avg_country_metric_2
FROM cte

参见demo。
结果：

<头>

avg_country_metric_1	avg_country_metric_2
4	2000

多列上没有重复的聚合结果，每列过滤不同

1 个答案: