U-SQL - 从复杂的嵌套json文件中提取数据

时间:2018-04-10 10:57:51

标签: json azure u-sql

我的json结构如下:

{
"First":"xxxx",
"Country":"XX",
"Loop": {
    "Links": [
        {
            "Url":"xxxx",
            "Time":123
        }, {
            "Url":"xxxx",
            "Time":123
        }],
    "TotalTime":123,
    "Date":"2018-04-09T10:29:39.0233082+00:00"
}

我想提取属性

First
Country
Url & Time foreach object in the array
TotalTime
Date

这是我的查询

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

@extration = 
EXTRACT 
    jsonString string 
FROM @"/storage-api/input.json" 
USING Extractors.Tsv(quoting:false);

@cleanUp = SELECT jsonString FROM @extration WHERE (!jsonString.Contains("Part: h" ) AND jsonString!= "465}");

@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS obj FROM @cleanUp;

@columnized = SELECT 
        obj["First"] AS first,
        obj["Country"] AS country
FROM @jsonify;

OUTPUT @columnized
TO @"/storage-api/outputs/tpe1-output.csv"
USING Outputters.Csv();

但是这个查询只提取了前2个属性,我不知道如何在“循环”中查询嵌套数据

1 个答案:

答案 0 :(得分:1)

您可以使用MultiLevelJsonExtractor(注释here)和JSON路径(例如Loop.Links[*])来执行此操作。 MultiLevelJsonExtractor有一个很好的功能,如果你的节点没有找到你的基本路径,它会递归检查它,虽然我不确定性能如何扩展到大型JSON文档或大量的JSON文档。

试试这个:

DECLARE @input string = "/input/input65.json";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;

@result =
    EXTRACT First string,
            Country string,
            Date DateTime,
            Url string,
            Time string,
            TotalTime int
    FROM @input
    USING new MultiLevelJsonExtractor("Loop.Links[*]",

          false,
          "First",
          "Country",
          "Date",
          "Url",
          "Time",
          "TotalTime"
          );


OUTPUT @result
TO "/output/output.csv"
USING Outputters.Csv();

我的结果:

Results

HTH