我在Bigquery中有下表-
WITH results AS
(SELECT 1 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.1,0.4,0.3,0.2] as probability
UNION ALL
SELECT 2 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.2,0.1,0.6,0.1] as probability
UNION ALL
SELECT 3 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.5,0.05,0.35,0.1] as probability
)
select * from results
在这里,每个客户都有一定的可能性购买水果。我想为每个客户及其相应的top 2
购买商品probabilities
。
输出类似这样的东西会很好-
customerid, fruits, probability
1, bananas, 0.4
1, grapes, 0.3
..
在上述最终结果中,对于customerid 1
,我只选择bananas
和grapes
,因为这两个水果的购买概率最高(来自[0.1,0.4,0.3,0.2]
)
我可以在BiqQuery中使用任何功能来实现此功能吗?
答案 0 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
WITH results AS (
SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability UNION ALL
SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability UNION ALL
SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
)
SELECT customerid, fruit, probability
FROM (
SELECT customerid, ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) top
FROM results,
UNNEST(probability) probability WITH OFFSET off1
JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
ON off1 = off2
GROUP BY customerid
), UNNEST(top)
有结果
Row customerid fruit probability
1 1 bananas 0.4
2 1 grapes 0.3
3 2 grapes 0.6
4 2 apples 0.2
5 3 apples 0.5
6 3 grapes 0.35
或者可能是更好的选择
#standardSQL
WITH results AS (
SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability UNION ALL
SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability UNION ALL
SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
)
SELECT customerid, fruit, probability
FROM (
SELECT customerid,
(
SELECT ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2)
FROM UNNEST(probability) probability WITH OFFSET off1
JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
ON off1 = off2
) top
FROM results
), UNNEST(top)
具有相同结果