对数组类型的许多字段的BigQuery的最佳查询

时间:2017-01-31 13:01:43

标签: google-bigquery

在Google BigQuery中,我定义了包含5个字段的表格 我正在从json格式加载它 模式如下,让我们调用表user_data

BigQuery中的数组类型只是可重用的字段

insert into opprt( cid, oppdetails, oppfp, oppap, oppcd, oppd )
    select c.cid, 'abc', '', '', '', ''
    from companydet c 
    where cname = 'google' ;

我需要运行像

这样的查询
userid: String
cats: Array[Int]
features:Array[Long]
segments:Array[Int]
tags:Array[Int]

运行此类查询的最佳permormant方法是什么,它应该是什么语法?

2 个答案:

答案 0 :(得分:1)

请尝试以下操作。它适用于BigQuery Standard SQL

#standardSQL
WITH user_data AS (
  SELECT '1' AS userid, ARRAY<INT64>[123,265] AS cats, ARRAY<INT64>[1,2] AS features, ARRAY<INT64>[555,666,777] AS segments, ARRAY<INT64>[100, 200] AS tags UNION ALL
  SELECT '2' AS userid, ARRAY<INT64>[1231,265] AS cats, ARRAY<INT64>[1,2] AS features, ARRAY<INT64>[555,666,777] AS segments, ARRAY<INT64>[100, 200] AS tags UNION ALL
  SELECT '3' AS userid, ARRAY<INT64>[123,265] AS cats, ARRAY<INT64>[1,2] AS features, ARRAY<INT64>[5551,666,777] AS segments, ARRAY<INT64>[100, 200] AS tags 
)
SELECT COUNT(userid) AS count_userid
FROM user_data
WHERE (SELECT COUNT(DISTINCT cat) FROM UNNEST(cats) AS cat WHERE cat IN (123, 265)) = 2
AND (SELECT COUNT(DISTINCT segment) FROM UNNEST(segments) AS segment WHERE segment IN (555,666,777)) = 3
AND (SELECT COUNT(DISTINCT tag) FROM UNNEST(tags) AS tag WHERE tag IN (100, 200)) = 2

答案 1 :(得分:1)

米哈伊尔答案的变化。我相信Julias想要计算每个维度上的条件为真的用户,即至少有一个常量匹配。在这种情况下,EXISTS将比COUNT(DISTINCT)更有效,即

#standardSQL
WITH user_data AS (
  SELECT '1' AS userid, ARRAY<INT64>[123,265] AS cats, ARRAY<INT64>[1,2] AS features, ARRAY<INT64>[555,666,777] AS segments, ARRAY<INT64>[100, 200] AS tags UNION ALL
  SELECT '2' AS userid, ARRAY<INT64>[1231,265] AS cats, ARRAY<INT64>[1,2] AS features, ARRAY<INT64>[555,666,777] AS segments, ARRAY<INT64>[100, 200] AS tags UNION ALL
  SELECT '3' AS userid, ARRAY<INT64>[123,265] AS cats, ARRAY<INT64>[1,2] AS features, ARRAY<INT64>[5551,666,777] AS segments, ARRAY<INT64>[100, 200] AS tags 
)
SELECT COUNT(userid) AS count_userid
FROM user_data
WHERE EXISTS(SELECT 1 FROM UNNEST(cats) cat WHERE cat IN (123, 265))
AND EXISTS(SELECT 1 FROM UNNEST(segments) segment WHERE segment IN (555,666,777))
AND EXISTS(SELECT 1 FROM UNNEST(tags) tag WHERE tag IN (100, 200))