从雪花中的已解析JSON检索子字段

时间:2020-05-05 11:58:14

标签: json snowflake-cloud-data-platform

我在获取地址组件的各个组件方面有些困难

with data as (select PARSE_JSON('{ "data" : [ [ "row-ea6u~fkaa~32ry", "00000000-0000-0000-01B7-0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\"address\": \"4509 BELAIR ROAD\", \"city\": \"Baltimore\", \"state\": \"MD\", \"zip\": \"\"}", null, null, null, true ], null, null, null ] }') as j ) select f.value[1][0]::text from data d, lateral flatten(input=> d.j:data,recursive=>TRUE) f;

f.value[1][0]有一个字段地址

{"address": "4509 BELAIR ROAD", "city": "Baltimore", "state": "MD", "zip": ""}

但是 f.value[1][0].address返回null

如何获取f.value [1]的各个属性,例如地址,城市等?

2 个答案:

答案 0 :(得分:1)

您可以按照该文章逐步实现它: https://community.snowflake.com/s/article/Using-lateral-flatten-to-extract-data-from-JSON-internal-field

希望这会有所帮助!

答案 1 :(得分:1)

问题是由于您具有三个嵌套数据级别,您不应该使用recursive=>TRUE,因为对象不相同,因此您无法从数据中获得任何有价值的信息。您需要手动将不同的层分开。

with data as (
  select 
  PARSE_JSON('{ data: [ [ "row-ea6u~fkaa~32ry", "0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\\"address\\": \\"4509 BELAIR ROAD\\", \\"city\\": \\"Baltimore\\", \\"state\\": \\"MD\\", \\"zip\\": \\"\\"}", null, null, null, true ], null, null, null ]]}') as j
), data_rows as (
    select f.value as r
    from data d,
    lateral flatten(input=> d.j:data) f
)
select dr.r[0] as v0
    ,dr.r[1] as v1
    ,dr.r[2] as v2
    ,dr.r[3] as v3
    ,f.value as addr_n
from data_rows dr,
    lateral flatten(input=> dr.r[13]) f;

因此,这将获取所有行(您的示例只有一个行),将感兴趣的值解包(您需要完成此部分并给出v0-vN的含义),但是有一个数组或地址

V0  V1  V2  V3  ADDR_N
"row-ea6u~fkaa~32ry"    "0B8F94EE5292"  0   1486063689  "{\"address\": \"4509 BELAIR ROAD\", \"city\": \"Baltimore\", \"state\": \"MD\", \"zip\": \"\"}"
"row-ea6u~fkaa~32ry"    "0B8F94EE5292"  0   1486063689  null
"row-ea6u~fkaa~32ry"    "0B8F94EE5292"  0   1486063689  null
"row-ea6u~fkaa~32ry"    "0B8F94EE5292"  0   1486063689  null
"row-ea6u~fkaa~32ry"    "0B8F94EE5292"  0   1486063689  true

现在将地址解码为json ,parse_json(f.value) as addr_n即可,因此您可以像下面这样将其拆分:

with data as (
  select 
  PARSE_JSON('{ data: [ [ "row-ea6u~fkaa~32ry", "0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\\"address\\": \\"4509 BELAIR ROAD\\", \\"city\\": \\"Baltimore\\", \\"state\\": \\"MD\\", \\"zip\\": \\"\\"}", null, null, null, true ], null, null, null ]]}') as j
), data_rows as (
    select f.value as r
    from data d,
    lateral flatten(input=> d.j:data) f
)
select dr.r[0] as v0
    ,dr.r[1] as v1
    ,dr.r[2] as v2
    ,dr.r[3] as v3
    ,parse_json(f.value) as addr_n
    ,addr_n:address::text as addr_address
    ,addr_n:city::text as addr_city
    ,addr_n:state::text as addr_state
    ,addr_n:zip::text as addr_zip  
from data_rows dr,
    lateral flatten(input=> dr.r[13]) f;

您可以以太坊留下addr_n虚拟变量,也可以像这样通过剪切粘贴将其替换掉:

    ,parse_json(f.value):address::text as addr_address
    ,parse_json(f.value):city::text as addr_city
    ,parse_json(f.value):state::text as addr_state
    ,parse_json(f.value):zip::text as addr_zip