你如何比较BigQuery中的两个数组?

时间:2017-04-27 19:36:45

标签: google-bigquery

我正在尝试连接两个表,每个表都有一个数组列,如下所示

SELECT a.id, b.value
FROM a INNER JOIN b
ON a.array IN b.array

SELECT a.id, b.value
FROM a INNER JOIN b
ON UNNEST(a.array) IN UNNEST(b.array)

根据this SO question,postgres有像< @ > @ 这样的运算符,可以比较其中一个是否是另一个数组的子集( postgres doc page)但BigQuery只允许将数组元素与其他数组进行比较,如下所示

a.arrayelement IN UNNEST(b.array)

可以在BigQuery中完成吗?

修改

这是我正在使用的架构

WITH b AS (
    {  "ip": "192.168.1.1",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "peach",
          "value: "pink"
        },
        { "key": "orange",
          "value: "orange"
        }
      ]
    }
    ,{  "ip": "192.168.1.2",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "orange",
          "value: "orange"
        }
      ]
    }
   ),
WITH a AS (
    {  "id": "12345",
      "cookie": [
        { "key": "peach",
          "value: "pink"
        }
      ]
    }
    ,{  "id": "67890",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "orange",
          "value: "orange"
        },

      ]
     }
)

我期待输出如下

ip, id
192.168.1.1, 67890 
192.168.1.2, 67890 
192.168.1.2, 12345

这是以下SO的延续, How do I find elements in an array in BigQuery。 我尝试使用子查询来比较其中一个数组的单个元素,但BigQuery返回一个错误,说我有"太多的子查询"

2 个答案:

答案 0 :(得分:6)

这是一个替代解决方案,它避免在相关子查询中运行JOIN,而是依赖于IN UNNEST()表达式 - 这应该会提供更好的性能:

#standardSQL
WITH a AS (
  SELECT 1 AS id, [2,4] AS a_arr UNION ALL
  SELECT 2, [3,5]
),
b AS (
  SELECT 11 AS value, [1,2,3,4] AS b_arr UNION ALL
  SELECT 12, [1,3,5,6]
)
SELECT a.id, b.value
FROM a , b
WHERE (SELECT LOGICAL_AND(a_i IN UNNEST(b.b_arr)) FROM UNNEST(a.a_arr) a_i)

答案 1 :(得分:4)

尝试以下示例(BigQuery Standard SQL)

SELECT a.id, b.value
FROM a INNER JOIN b
ON a.array IN b.array  

它模仿伪代码:

http://ionicframework.com/docs/resources/ng2-translate/

如果您希望我将此应用于您的示例,请告诉我 - 或者您将首先尝试自己:o)