拆分子字符串并为大查询中的每个创建新列

时间:2017-05-18 19:25:21

标签: sql google-bigquery

我想将空格分隔的字符串拆分为5并为每个字符串创建列,但我发现难以生成所需的输出。 编辑:使用标准SQL方言

示例数据:

Row published_at                data_string          device id 
1   2016-10-26T22:53:03.209Z    70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan    2a0025000351353337353037
... 
1 of 570 rows

期望的输出:

Row published_at                battery temp1  humid1 temp2  humid2 temp3 humid3 device_id   
1   2016-11-03T16:24:09.833Z    70.77 3.38 61.65 7.98 73.20 3.29 63.55 2a0025000351353337353037 
1 of 570 rows

尝试查询1.a:

WITH
  h2a0025_2 AS (
  SELECT
    TIMESTAMP '2016-10-26T22:53:03.209Z' AS published_at,
    '70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan' AS data_string,
    '2a0025000351353337353037' AS device_id
  UNION ALL
  SELECT
    TIMESTAMP '2016-10-26T22:53:03.209Z',
    '70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan',
    '2a0025000351353337353037' )
SELECT
  published_at,
  parts[OFFSET(0)] AS Battery,
  parts[OFFSET(1)] AS Temp1,
  parts[OFFSET(1)] AS Humid1,
  parts[OFFSET(2)] AS Temp2,
  parts[OFFSET(3)] AS Humid2,
  parts[OFFSET(4)] AS Temp3,
  parts[OFFSET(5)] AS Humid3,
  device_id
FROM (
  SELECT
    * EXCEPT(data_string),
    SPLIT(data_string, ' ') AS parts
  FROM
    `h2a0025_2`);

结果1.a:2个相同的行

  Row   published_at                battery temp1  humid1 temp2  humid2 temp3 humid3 device_id   
    1   2016-11-03T16:24:09.833Z    70.77 3.38 61.65 7.98 73.20 3.29 63.55 2a0025000351353337353037 
    2   2016-11-03T16:24:09.833Z    70.77 3.38 61.65 7.98 73.20 3.29 63.55 2a0025000351353337353037
2 of 2 rows

尝试2:

 SELECT
      published_at,
      parts[OFFSET(0)] AS Battery,
      parts[OFFSET(1)] AS Temp1,
      parts[OFFSET(1)] AS Humid1,
      parts[OFFSET(2)] AS Temp2,
      parts[OFFSET(3)] AS Humid2,
      parts[OFFSET(4)] AS Temp3,
      parts[OFFSET(5)] AS Humid3,
      device_id
    FROM (
      SELECT
        * EXCEPT(data_string),
        SPLIT(data_string, ' ') AS parts
      FROM
        `myproject.mydataset.h2a0025_2`);

结果:       查询失败       错误:数组索引3超出范围(溢出)

1 个答案:

答案 0 :(得分:2)

这是一个让你入门的例子。不要试图获得正确的子字符串位置,而是使用SPLIT函数,然后在结果数组中选择所需的偏移量。

#standardSQL
WITH YourTable AS (
  SELECT
    TIMESTAMP '2016-11-03T16:24:09.833Z' AS published_at,
    '80.91 22.15 45.35 14.41 64.54' AS data_string
  UNION ALL
  SELECT
    TIMESTAMP '2016-11-04T18:34:08.143Z',
    '75.37 28.43 31.17 34.80 19.33'
)
SELECT
  published_at,
  parts[OFFSET(0)] AS Temp1,
  parts[OFFSET(1)] AS Humid1,
  parts[OFFSET(2)] AS Temp2,
  parts[OFFSET(3)] AS Humid2
FROM (
  SELECT
    * EXCEPT(data_string),
    SPLIT(data_string, ' ') AS parts
  FROM YourTable
);

要使用真实表进行测试 - 请仅使用以下部分脚本 -

#standardSQL
SELECT
  published_at,
  parts[OFFSET(0)] AS Temp1,
  parts[OFFSET(1)] AS Humid1,
  parts[OFFSET(2)] AS Temp2,
  parts[OFFSET(3)] AS Humid2
FROM (
  SELECT
    * EXCEPT(data_string),
    SPLIT(data_string, ' ') AS parts
  FROM `yourproject.yourdataset.yourtable`
);