雪花上的 Json 数据展平

时间:2021-05-02 15:25:03

标签: snowflake-cloud-data-platform

我正在尝试将雪花上的 Json 数据压平:

JSON 数据:

 {
    "empDetails": [
        {
            "kind": "person",
            "fullName": "John Doe",
            "age": 22,
            "gender": "Male",
            "phoneNumber": {
                "areaCode": "206",
                "number": "1234567"
            },
            "children": [
                {
                    "name": "Jane",
                    "gender": "Female",
                    "age": "6"
                },
                {
                    "name": "John",
                    "gender": "Male",
                    "age": "15"
                }
            ],
            "citiesLived": [
                {
                    "place": "Seattle",
                    "yearsLived": [
                        "1995"
                    ]
                },
                {
                    "place": "Stockholm",
                    "yearsLived": [
                        "2005"
                    ]
                }
            ]
        },
        {
            "kind": "person",
            "fullName": "Mike Jones",
            "age": 35,
            "gender": "Male",
            "phoneNumber": {
                "areaCode": "622",
                "number": "1567845"
            },
            "children": [
                {
                    "name": "Earl",
                    "gender": "Male",
                    "age": "10"
                },
                {
                    "name": "Sam",
                    "gender": "Male",
                    "age": "6"
                },
                {
                    "name": "Kit",
                    "gender": "Male",
                    "age": "8"
                }
            ],
            "citiesLived": [
                {
                    "place": "Los Angeles",
                    "yearsLived": [
                        "1989",
                        "1993",
                        "1998",
                        "2002"
                    ]
                },
                {
                    "place": "Washington DC",
                    "yearsLived": [
                        "1990",
                        "1993",
                        "1998",
                        "2008"
                    ]
                },
                {
                    "place": "Portland",
                    "yearsLived": [
                        "1993",
                        "1998",
                        "2003",
                        "2005"
                    ]
                },
                {
                    "place": "Austin",
                    "yearsLived": [
                        "1973",
                        "1998",
                        "2001",
                        "2005"
                    ]
                }
            ]
        },
        {
            "kind": "person",
            "fullName": "Anna Karenina",
            "age": 45,
            "gender": "Female",
            "phoneNumber": {
                "areaCode": "425",
                "number": "1984783"
            },
            "citiesLived": [
                {
                    "place": "Stockholm",
                    "yearsLived": [
                        "1992",
                        "1998",
                        "2000",
                        "2010"
                    ]
                },
                {
                    "place": "Russia",
                    "yearsLived": [
                        "1998",
                        "2001",
                        ""
                    ]
                },
                {
                    "place": "Austin",
                    "yearsLived": [
                        "1995",
                        "1999"
                    ]
                }
            ]
        }
    ]
}

除了列/数组年份之外,我能够展平大部分数据, 对于最后一列,我得到的是空值。

以下是我迄今为止尝试过的:

  select empd.value:kind,
  empd.value:fullName,
  empd.value:age,
  empd.value:gender,   
  empd.value:phoneNumber,
  empd.value:phoneNumber.areaCode, 
  empd.value:phoneNumber.number ,
  empd.value:children, 
  chldrn.value:name,
  chldrn.value:gender,
  chldrn.value:age,
  city.value:place,
  yr.value:yearsLived
  from my_json emp,
  lateral flatten(input=>emp.Json_data:empDetails) empd , 
  lateral flatten(input=>empd.value:children, OUTER => TRUE) chldrn,   
  lateral flatten(input=>empd.value:citiesLived) city,
  lateral flatten(input=>city.value:yearsLived) yr -- not getting data for 
  this array

有人能帮我理解为什么我得到 yearsLived 数组的空值吗?我不确定我是否在这里遗漏了什么

2 个答案:

答案 0 :(得分:0)

您的查询返回列

yr.value:yearsLived

好像 yr.value 是一个带有字段的 OBJECT。

但是您已经扩展了行中的 yearsLived 字段

lateral flatten(input=>city.value:yearsLived) yr 

所以 yr.value 实际上只是一个包含年份的 VARIANT。您可以保持原样——或者将其包装在 TO_NUMBERTO_VARCHAR 中以获得更精确的类型。

答案 1 :(得分:0)

为什么不试试这个。

create or replace table json_tab as
select parse_json('{ "place": "Austin","yearsLived": [ "1995","1999"]}') as years
select years:yearsLived[0]::int from json_tab

由于您的 JSON 数据是一个数组,如果您想获取特定值或使用任何数组函数来分解它,您需要通过索引访问元素。

See the result here

具有展平功能

select years, v.value::string 
from json_tab, 
lateral flatten(input =>years:yearsLived ) v;

with flatten function