自动在BigQuery中创建一系列变量

时间:2017-02-13 10:46:36

标签: sql for-loop google-bigquery

我想自动创建"目录变量"从一组URIS直到达到最大目录数。

例如,如果我有来自URI的4个目录:"/A/B/C/17628.html",我想创建以下变量:

  1. path_1 = "A"
  2. path_2 = "B"
  3. path_3 = "C"
  4. path_4 = "17628.html"
  5. 但如果我有:"/A/D/E/F/178.html"

    1. path_1 = "A"
    2. path_2 = "D"
    3. path_3 = "E"
    4. path_4 = "F"
    5. path_5 = "178.html"
    6. 可能有一个包含许多目录的URI(最多20个)。 为避免手动创建所有这些变量,我想使用for循环(或其他选项)定义它们。 可以在BigQuery中使用这个循环吗?

2 个答案:

答案 0 :(得分:1)

考虑以下版本

  
#standardSQL
WITH yourTable AS (
  SELECT '/A/B/C/17628.html' AS uri UNION ALL
  SELECT '/A/D/E/F/178.html' AS uri
)
SELECT uri, CONCAT('path_', CAST(1 + OFFSET AS STRING)) AS pos, path
FROM yourTable, UNNEST(SPLIT(REGEXP_EXTRACT(uri, r'/(.*)/'), '/')) path WITH OFFSET
ORDER BY uri, OFFSET

结果是:

uri                 pos     path     
/A/B/C/17628.html   path_1     A     
/A/B/C/17628.html   path_2     B     
/A/B/C/17628.html   path_3     C     
/A/D/E/F/178.html   path_1     A     
/A/D/E/F/178.html   path_2     D     
/A/D/E/F/178.html   path_3     E     
/A/D/E/F/178.html   path_4     F     

在大多数实际情况中,使用这样一个扁平的架构而不是透视 - 使用

更容易处理(查询)

如果你仍然希望在结果上方进行调整 - 请参阅我对该主题的众多答案之一 - Transpose rows into columns in BigQuery (Pivot implementation)

答案 1 :(得分:0)

您需要明确指定选择列表中的列;列本身不可能是动态的。如果您可以将结果作为数组返回,则可以执行以下操作:

#standardSQL
WITH T AS (
  SELECT '/A/B/C/17628.html' AS path UNION ALL
  SELECT '/A/D/E/F/178.html' AS path
)
SELECT
  ARRAY(SELECT IFNULL(subpaths[SAFE_OFFSET(x)], '')
        FROM UNNEST(GENERATE_ARRAY(0, 19)) AS x) AS paths
FROM (
  SELECT SPLIT(path, '/') AS subpaths
  FROM T
);

如果您想要明确的path_1path_2等列,您可以这样做:

#standardSQL
WITH T AS (
  SELECT '/A/B/C/17628.html' AS path UNION ALL
  SELECT '/A/D/E/F/178.html' AS path
)
SELECT
  subpaths[SAFE_OFFSET(1)] AS path_1,
  subpaths[SAFE_OFFSET(2)] AS path_2,
  subpaths[SAFE_OFFSET(3)] AS path_3,
  subpaths[SAFE_OFFSET(4)] AS path_4,
  subpaths[SAFE_OFFSET(5)] AS path_5,
  subpaths[SAFE_OFFSET(6)] AS path_6,
  subpaths[SAFE_OFFSET(7)] AS path_7,
  subpaths[SAFE_OFFSET(8)] AS path_8,
  subpaths[SAFE_OFFSET(9)] AS path_9,
  subpaths[SAFE_OFFSET(10)] AS path_10,
  subpaths[SAFE_OFFSET(11)] AS path_11,
  subpaths[SAFE_OFFSET(12)] AS path_12,
  subpaths[SAFE_OFFSET(13)] AS path_13,
  subpaths[SAFE_OFFSET(14)] AS path_14,
  subpaths[SAFE_OFFSET(15)] AS path_15,
  subpaths[SAFE_OFFSET(16)] AS path_16,
  subpaths[SAFE_OFFSET(17)] AS path_17,
  subpaths[SAFE_OFFSET(18)] AS path_18,
  subpaths[SAFE_OFFSET(19)] AS path_19,
  subpaths[SAFE_OFFSET(20)] AS path_20
FROM (
  SELECT SPLIT(path, '/') AS subpaths
  FROM T
);

由于我不想手工编写该列表,我在终端中运行了一个简单的单行程序:

for i in `seq 1 20`; do echo "subpaths[SAFE_OFFSET($i)] AS path_$i,"; done