在雪花中提取变体/json 数据

时间:2021-02-02 21:08:56

标签: snowflake-cloud-data-platform

我有一列,假设是variant_column,它的数据看起来完全像这样(这将是一行数据):

[
  {
    "key": "column_name_1",
    "value": "value_goes_here"
  },
  {
    "key": "metadata",
    "value": "{this_could_be_another_huge_json_here}"
  },
  {
    "key": "column_name_2",
    "value": "value_goes_here_again"
  },
  {
    "key": "column_name_3",
    "value": "value_goes_here_yet_again"
  }
]

如何查询特定的键值?即,我希望我的查询结果如下所示:

column_name_1
value_goes_here
more values...for each row of data

每一行总是有 "key": "column_name_1" 和一个可以改变的关联值。我试过了:

get_path(variant_column, '"key": "column_name_1"')
get_path(variant_column, 'column_name_1')

和其他一些,但是变体中的每个条目都将具有“键”和“值”这一事实让我感到困惑。如何从“column_name_1”(它们的键始终为“column_name_1”)和关联的“value”(它始终称为“value”,但“value”的实际数据会有所不同)创建单个列。

1 个答案:

答案 0 :(得分:1)

不是理想的 JSON 结构,但您仍然可以将其展平,使用一些 case 语句,然后再次将内容聚合在一起。尝试这样的事情:

WITH x AS (
SELECT parse_json('[
  {
    "key": "column_name_1",
    "value": "value_goes_here"
  },
  {
    "key": "metadata",
    "value": "{this_could_be_another_huge_json_here}"
  },
  {
    "key": "column_name_2",
    "value": "value_goes_here_again"
  },
  {
    "key": "column_name_3",
    "value": "value_goes_here_yet_again"
  }
]') as var
)
SELECT f.seq,
       MAX(CASE WHEN f.value:key::varchar = 'column_name_1' THEN f.value:value::varchar END) as column_name_1,
       MAX(CASE WHEN f.value:key::varchar = 'column_name_2' THEN f.value:value::varchar END) as column_name_2,
       MAX(CASE WHEN f.value:key::varchar = 'column_name_3' THEN f.value:value::varchar END) as column_name_3,
       MAX(CASE WHEN f.value:key::varchar = 'metadata' THEN f.value:value::variant END) as metadata
FROM x,
LATERAL FLATTEN(input=>var) f
GROUP BY f.seq
;