U-SQL - 从json-array中提取数据

时间:2016-07-01 14:27:04

标签: json azure-data-lake u-sql

已经尝试过建议的JSONPath选项,但似乎JSONExtractor只识别根级别。在我的情况下,我必须处理嵌套的json结构,也有一个数组(见下面的例子)。没有多个中间文件的任何提取选项?

"relation": {
"relationid": "123456",
"name": "relation1",
"addresses": {
    "address": [{
        "addressid": "1",
        "street": "Street 1",
        "postcode": "1234 AB",
        "city": "City 1"
        },
    {
        "addressid": "2",
        "street": "Street 2",
        "postcode": "5678 CD",
        "city": "City 2"
    }]
}}

SELECT relationid,addressid,street,postcode,city?

1 个答案:

答案 0 :(得分:11)

将JSON片段修复为:

{
    "relation": {
        "relationid": "123456",
        "name": "relation1",
        "addresses": {
            "address": [{
                "addressid": "1",
                "street": "Street 1",
                "postcode": "1234 AB",
                "city": "City 1"
            }, {
                "addressid": "2",
                "street": "Street 2",
                "postcode": "5678 CD",
                "city": "City 2"
            }]
        }
    }
}

并将其放入文件中,以下脚本将为您提供所需的内容。请注意,您需要向下导航结构以携带更高级别的项目,一旦遇到数组,如果您只需要具有该数组的那些项,则需要CROSS APPLY EXPLODE,如果您需要具有该数组的行,则需要OUTER APPLY EXPLODE缺少数组。

DECLARE @input string = @"/temp/stackoverflow.json";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

@json = 
EXTRACT relationid int, 
        name string, 
        addresses string
  FROM @input 
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("relation");

@relation =
SELECT relationid,
       name,
       Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(addresses)["address"] AS address_array
FROM @json;

@addresses = 
SELECT relationid, 
       name, 
       Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(address) AS address
FROM @relation
     CROSS APPLY
        EXPLODE (Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(address_array).Values) AS A(address);

@result =
SELECT relationid,
       name,
       address["addressid"]AS addressid,
       address["street"]AS street,
       address["postcode"]AS postcode,
       address["city"]AS city
FROM @addresses;

OUTPUT @result
TO "/users/temp/st_out.csv"
USING Outputters.Csv();