我有以下格式的一些BigQuery数据:
"thing": [
{
"name": "gameLost",
"params": [
{
"key": "total_games",
"val": {
"str_val": "3",
"int_val": null
}
},
{
"key": "games_won",
"val": {
"str_val": "2",
"int_val": null
}
},
{
"key": "game_time",
"val": {
"str_val": "44",
"int_val": null
}
}
],
"dt_a": "1470625311138000",
"dt_b": "1470620345566000"
}
我知道FLATTEN()函数将导致输出3行,如下所示:
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
| thing.name | thing.dt_a | event_dim.dt_b | thing.params.key | thing.params.val.str_val | thing.params.val.int_val |
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
| gameLost | 1470625311138000 | 1470620345566000 | total_games_played | 3 | null |
| | | | | | |
| gameLost | 1470625311138000 | 1470620345566000 | games_won | 2 | null |
| | | | | | |
| gameLost | 1470625311138000 | 1470620345566000 | game_time | 44 | null |
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
其中更高级别的键/值会重复为每个更深层次对象的新行。
但是,我需要将更深的键/值输出为全新的列,而不是重复字段,因此结果将如下所示:
+------------+------------------+------------------+--------------------+-----------+-----------+
| thing.name | thing.dt_a | event_dim.dt_b | total_games_played | games_won | game_time |
+------------+------------------+------------------+--------------------+-----------+-----------+
| gameLost | 1470625311138000 | 1470620345566000 | 3 | 2 | 44 |
+------------+------------------+------------------+--------------------+-----------+-----------+
我该怎么做?
谢谢!
答案 0 :(得分:3)
Standard SQL使表达更容易(取消选中"使用旧版SQL""显示选项"):
WITH T AS (
SELECT STRUCT(
"gameLost" AS name,
ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
STRUCT("total_games", STRUCT("3", NULL)),
STRUCT("games_won", STRUCT("2", NULL)),
STRUCT("game_time", STRUCT("44", NULL))] AS params,
1470625311138000 AS dt_a,
1470620345566000 AS dt_b) AS thing
)
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
thing.params[OFFSET(0)].val.str_val AS total_games_played,
thing.params[OFFSET(1)].val.str_val AS games_won,
thing.params[OFFSET(2)].val.str_val AS game_time
FROM T;
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
| thing | total_games_played | games_won | game_time |
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
| {"name":"gameLost","dt_a":"1470625311138000","dt_b":"1470620345566000"} | 3 | 2 | 44 |
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
如果您不知道数组中键的顺序,可以使用子选择来提取相关值:
WITH T AS (
SELECT STRUCT(
"gameLost" AS name,
ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
STRUCT("total_games", STRUCT("3", NULL)),
STRUCT("games_won", STRUCT("2", NULL)),
STRUCT("game_time", STRUCT("44", NULL))] AS params,
1470625311138000 AS dt_a,
1470620345566000 AS dt_b) AS thing
)
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "total_games") AS total_games_played,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "games_won") AS games_won,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "game_time") AS game_time
FROM T;
答案 1 :(得分:1)
尝试以下(旧版SQL)
SELECT
thing.name AS name,
thing.dt_a AS dt_a,
thing.dt_b AS dt_b
MAX(IF(thing.params.key = "total_games_played", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS total_games_played,
MAX(IF(thing.params.key = "games_won", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS games_won,
MAX(IF(thing.params.key = "game_time", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS game_time,
FROM YourTable
对于标准SQL,您可以尝试(灵感来自Elliott的回答 - 重要的区别 - 数组按键排序,因此保证了键值的顺序)
WITH Temp AS (
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
ARRAY(SELECT val.str_val AS val FROM UNNEST(thing.params) ORDER BY key) AS params
FROM YourTable
)
SELECT
thing,
params[OFFSET(2)] AS total_games_played,
params[OFFSET(1)] AS games_won,
params[OFFSET(0)] AS game_time
FROM Temp
注意:如果在params中有其他键 - 你应该在ARRAY中的SELECT中添加WHERE子句