Question

想象一下有两列的表格如下：

Account_ID (integer)
Product_ID (integer)

其他列不重要。这列出了帐户购买的产品。我想创建一个包含三列的输出：

Account_ID_1 | Account_ID_2 | Count(distinct product_ID)

结果应该包含Account_ID的所有值以及每个Account_Id组合中常见Product_Ids的相关不同计数。

我正在使用Google BigQuery。有没有SQL方法来执行此操作，还是应该计划使用完整的编程语言对其进行编码？

Answer 1

在这里，我计算两个帐户在comon中有多少产品。

=IF( $B7 = "", "",
IFERROR( --TRIM( LEFT( $E7, FIND( " ", $E7 ) ) ), "!Err" ) )

Answer 2

这对我有用：

select
   t1.Account_ID, T2.Account_ID, count(t1.Product_ID) count_product_id 
from
   MYTABLE t1 join MYTABLE t2 on t1.Product_ID = t2.Product_ID
where t1.Account_ID <> t2.Account_ID
group by t1.Account_ID, t2.Account_ID
order by 1,2

Answer 3

BigQuery版本：

（仅在相等时加入，同时保持＆lt;在WHERE子句中）

SELECT a.corpus, b.corpus, EXACT_COUNT_DISTINCT(a.word) c
FROM
(SELECT corpus, word FROM [publicdata:samples.shakespeare]) a
JOIN
(SELECT corpus, word FROM [publicdata:samples.shakespeare]) b
ON a.word=b.word
WHERE a.corpus>b.corpus
GROUP BY 1, 2
ORDER BY 4 DESC

查找产品之间的常见帐户清单

3 个答案: