将BigQuery嵌套的字段内容展平为新列而不是行

时间:2016-08-08 22:41:56

标签: google-bigquery

我有以下格式的一些BigQuery数据:

"thing": [
  {
    "name": "gameLost",
    "params": [
      {
        "key": "total_games",
        "val": {
          "str_val": "3",
          "int_val": null
        }
      },
      {
        "key": "games_won",
        "val": {
          "str_val": "2",
          "int_val": null
        }
      },
      {
        "key": "game_time",
        "val": {
          "str_val": "44",
          "int_val": null
        }
      }
    ],
    "dt_a": "1470625311138000",
    "dt_b": "1470620345566000"
  }

我知道FLATTEN()函数将导致输出3行,如下所示:

+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
| thing.name | thing.dt_a       | event_dim.dt_b   | thing.params.key   | thing.params.val.str_val | thing.params.val.int_val |
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
| gameLost   | 1470625311138000 | 1470620345566000 | total_games_played | 3                        | null                     |
|            |                  |                  |                    |                          |                          |
| gameLost   | 1470625311138000 | 1470620345566000 | games_won          | 2                        | null                     |
|            |                  |                  |                    |                          |                          |
| gameLost   | 1470625311138000 | 1470620345566000 | game_time          | 44                       | null                     |
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+

其中更高级别的键/值会重复为每个更深层次对象的新行。

但是,我需要将更深的键/值输出为全新的列,而不是重复字段,因此结果将如下所示:

+------------+------------------+------------------+--------------------+-----------+-----------+
| thing.name | thing.dt_a       | event_dim.dt_b   | total_games_played | games_won | game_time |
+------------+------------------+------------------+--------------------+-----------+-----------+
| gameLost   | 1470625311138000 | 1470620345566000 | 3                  | 2         | 44        |
+------------+------------------+------------------+--------------------+-----------+-----------+

我该怎么做?
谢谢!

2 个答案:

答案 0 :(得分:3)

Standard SQL使表达更容易(取消选中"使用旧版SQL""显示选项"):

WITH T AS (
  SELECT STRUCT(
    "gameLost" AS name,
    ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
      STRUCT("total_games", STRUCT("3", NULL)),
      STRUCT("games_won", STRUCT("2", NULL)),
      STRUCT("game_time", STRUCT("44", NULL))] AS params,
    1470625311138000 AS dt_a,
    1470620345566000 AS dt_b) AS thing
)
SELECT
  (SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
  thing.params[OFFSET(0)].val.str_val AS total_games_played,
  thing.params[OFFSET(1)].val.str_val AS games_won,
  thing.params[OFFSET(2)].val.str_val AS game_time
FROM T;
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
|                                  thing                                  | total_games_played | games_won | game_time |
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
| {"name":"gameLost","dt_a":"1470625311138000","dt_b":"1470620345566000"} | 3                  | 2         | 44        |
+-------------------------------------------------------------------------+--------------------+-----------+-----------+

如果您不知道数组中键的顺序,可以使用子选择来提取相关值:

WITH T AS (
  SELECT STRUCT(
    "gameLost" AS name,
    ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
      STRUCT("total_games", STRUCT("3", NULL)),
      STRUCT("games_won", STRUCT("2", NULL)),
      STRUCT("game_time", STRUCT("44", NULL))] AS params,
    1470625311138000 AS dt_a,
    1470620345566000 AS dt_b) AS thing
)
SELECT
  (SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
  (SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "total_games") AS total_games_played,
  (SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "games_won") AS games_won,
  (SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "game_time") AS game_time
FROM T;

答案 1 :(得分:1)

尝试以下(旧版SQL)

SELECT 
  thing.name AS name,
  thing.dt_a AS dt_a,
  thing.dt_b AS dt_b
  MAX(IF(thing.params.key = "total_games_played", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS total_games_played,
  MAX(IF(thing.params.key = "games_won", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS games_won,
  MAX(IF(thing.params.key = "game_time", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS game_time,
FROM YourTable  

对于标准SQL,您可以尝试(灵感来自Elliott的回答 - 重要的区别 - 数组按键排序,因此保证了键值的顺序)

WITH Temp AS (
  SELECT 
    (SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
    ARRAY(SELECT val.str_val AS val FROM UNNEST(thing.params) ORDER BY key) AS params
  FROM YourTable
)
SELECT 
  thing, 
  params[OFFSET(2)] AS total_games_played,
  params[OFFSET(1)] AS games_won,
  params[OFFSET(0)] AS game_time
FROM Temp 

注意:如果在params中有其他键 - 你应该在ARRAY中的SELECT中添加WHERE子句