我有一张购买决策表,如下所示:
org_id item_id spend
--------------------------
123 AAB 2
123 AAC 4
124 AAB 10
124 AAD 5
我想查找仅由三个或更少的组织购买的所有商品,然后我想通过总计支出以及组织的ID来订购它们。
这是我获取该列表中项目的查询:
SELECT
item_id,
EXACT_COUNT_DISTINCT(org) AS org_count,
SUM(spend) AS total_spend
FROM
[mytable]
GROUP BY
item_id
HAVING
org_count < 4
ORDER BY
total_spend DESC
它给我的结果如下:
item_id total_spend
--------------------------
AAB 12
AAC 4
AAD 5
但我需要扩展此查询以返回这些组织的ID。
这可能在一个查询中,还是我需要在多个查询中执行?
获取组织ID的查询如下:
SELECT
org
FROM
mytable
WHERE item_id IN (SELECT item_id ... etc, query as above)
但我不确定如何将两者粘合在一起。
更新:理想情况下,我最终得到的东西与原始表格很相似,但只包含三个或更少组织购买的商品:
org_id item_id spend
--------------------------
123 AAB 2
123 AAC 4
124 AAB 10
124 AAD 5
答案 0 :(得分:0)
尝试这样的查询。在结果集中,您将看到由那里或更少的组织和总支出购买的所有项目
SELECT T2.org_id,
T.item_id
FROM table AS T2
JOIN
(SELECT item_id,
SUM(spend) AS total_spend
FROM table AS T
GROUP BY T.item_id
HAVING COUNT(DISTINCT org_id) < 4) AS T ON T.item_id = T2.item_id
ORDER BY T.total_spend DESC
答案 1 :(得分:0)
您想要的功能是GROUP_CONCAT()
。但是,bigquery中没有DISTINCT
选项。所以,使用子查询:
SELECT item_id, COUNT(*) AS org_count,
SUM(io_spend) AS total_spend,
GROUP_CONCAT(org, ', ') as orgs
FROM (SELECT item_id, org, SUM(spend) as io_spend
FROM t
GROUP BY item_id, org
) io
GROUP BY item_id
HAVING org_count < 4
ORDER BY total_spend DESC;
编辑:
如果您对单独行中的ID感兴趣,那么以下是可能在Bigquery中有效的SQL版本:
SELECT item_id, org,
SUM(spend) as org_spend,
SUM(SUM(spend)) OVER (PARTITION BY item_id) as total_spend,
COUNT(*) OVER (PARTITION BY item_id) as numOrgs
FROM t
GROUP BY item_id, org
HAVING numOrgs < 4;
答案 2 :(得分:0)
在BigQuery中 - JOIN 有时候会非常头疼(取决于多种因素),因此在您的工具库中使用一些非连接解决方案总是好的。
以下是基于Window functions的几个例子: 我认为从实践和学习前景来看,它们都很有趣
选项#1 - 使用group_concat / regexp技巧
SELECT org_id, item_id, spend
FROM (
SELECT org_id, item_id, spend,
GROUP_CONCAT(STRING(org_id)) OVER(PARTITION BY item_id) AS orgs
FROM table
)
WHERE 1 + LENGTH(REGEXP_REPLACE(orgs, r'[^,]', '')) < 4
ORDER BY item_id, org_id
选项#2 - 假设每个项目的平均数量组织不是太大(因此计数明显不那么精确):
SELECT org_id, item_id, spend
FROM (
SELECT org_id, item_id, spend,
COUNT(DISTINCT org_id) OVER(PARTITION BY item_id) AS orgs
FROM table
)
WHERE orgs < 4
ORDER BY item_id, org_id