相对于BigQuery中的另一个数组列对一个数组列进行排序

时间:2018-10-01 18:43:01

标签: google-bigquery

我在Bigquery中有下表-

WITH results AS
  (SELECT 1 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.1,0.4,0.3,0.2] as probability
  UNION ALL
  SELECT 2 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.2,0.1,0.6,0.1] as probability
  UNION ALL
  SELECT 3 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.5,0.05,0.35,0.1] as probability
  )
 select * from results

在这里,每个客户都有一定的可能性购买水果。我想为每个客户及其相应的top 2购买商品probabilities

输出类似这样的东西会很好-

customerid, fruits, probability
1, bananas, 0.4
1, grapes, 0.3
..

在上述最终结果中,对于customerid 1,我只选择bananasgrapes,因为这两个水果的购买概率最高(来自[0.1,0.4,0.3,0.2]

我可以在BiqQuery中使用任何功能来实现此功能吗?

1 个答案:

答案 0 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL
WITH results AS (
  SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability   UNION ALL
  SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability   UNION ALL
  SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
)
SELECT customerid, fruit, probability
FROM (
  SELECT customerid, ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) top
  FROM results, 
    UNNEST(probability) probability WITH OFFSET off1
    JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
    ON off1 = off2
  GROUP BY customerid
), UNNEST(top)  

有结果

Row customerid  fruit   probability  
1   1           bananas 0.4  
2   1           grapes  0.3  
3   2           grapes  0.6  
4   2           apples  0.2  
5   3           apples  0.5  
6   3           grapes  0.35     

或者可能是更好的选择

#standardSQL
WITH results AS (
  SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability   UNION ALL
  SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability   UNION ALL
  SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
)
SELECT customerid, fruit, probability
FROM (
  SELECT customerid, 
    (
      SELECT ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) 
      FROM   UNNEST(probability) probability WITH OFFSET off1
      JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
      ON off1 = off2
    ) top
  FROM results
), UNNEST(top)

具有相同结果