从输入数组中查找包含最多项目的数组

时间:2019-01-31 23:16:06

标签: google-bigquery

我有两个表,如下所示:

Table 1:
[1,2,3,4,5]

Table 2:
[2,3,4]
[1,4]
[9,5,7]

我的目标是从表2中找到包含表1中元素数量最多的数组。在此示例中,预期结果将是表2中的记录[2,3,4]。

到目前为止,我有以下内容,但我正在努力整合最大元素逻辑:

#standardSQL
WITH query_items AS (
  SELECT [96072688,25185958] AS items
),
lookup_values AS (
  SELECT antecedent from recommendation_engine.association_rules
)
SELECT query_items.items, lookup_values.antecedent
FROM query_items, lookup_values, UNNEST([(SELECT ARRAY_LENGTH(query_items.items) - COUNT(1) 
                      FROM UNNEST(query_items.items) AS input 
                      JOIN UNNEST(lookup_values.antecedent)  AS output 
                      ON input = output)]) AS results
WHERE results = 0

在此先感谢您提供的任何帮助!

1 个答案:

答案 0 :(得分:2)

下面的示例(适用于BigQuery Standard SQL)应该为您提供一个想法

#standardSQL
WITH `project.dataset.table1` AS (
  SELECT [1,2,3,4,5] target
), `project.dataset.table2` AS (
  SELECT [2,3,4] candidates UNION ALL
  SELECT [1,4] UNION ALL
  SELECT [9,5,7] 
)
SELECT *, 
  (SELECT COUNT(1) 
    FROM t1.target x 
    JOIN t2.candidates y 
    ON x=y
  ) matches
FROM `project.dataset.table1` t1
CROSS JOIN `project.dataset.table2` t2
ORDER BY matches DESC
LIMIT 1  

有结果

#   target      candidates  matches
1   [1,2,3,4,5] [2,3,4]     3