U-SQL JsonTuple-如何访问JSON数组中的特定字段

时间:2019-05-13 23:17:30

标签: json u-sql

我正在提取具有我需要从中获取值的JSON字段的AVRO数据。 JSON有一个数组,但我不知道该数组的不同元素可能以什么顺序出现。如何定位特定的节点/值?

例如,Filters [0]可能一次是Category,但是另一次可能是AddressType。

我正在提取AVRO数据-即

@rs =
    EXTRACT date DateTime,
            Body byte[]
    FROM @input_file 
    USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
 ...

Body是可以如下所示的JSON(但是Category并不总是Filter [0]。这是一个小例子;有7种不同类型的“字段”):

{
    ""TimeStamp"": ""2019-02-19T15:00:29.1067771-05:00"",
    ""Filters"": [{
            ""Operator"": ""eq"",
            ""Field"": ""Category"",
            ""Value"": ""Sale""
        }, {
            ""Operator"": ""eq"",
            ""Field"": ""AddressType"",
            ""Value"": ""Home""
        }
    ]
}

我的U-SQL看起来像这样,但这并不总是有效。

@keyvalues =
    SELECT JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body), 
        "TimeStamp",
        "$.Filters[?(@.Field == 'Category')].Value",
        "$.Filters[?(@.Field == 'AddressType')].Value"
        ) AS message
    FROM @rs;


@results =
    SELECT 
           message["TimeStamp"] AS TimeStamp,
           message["Filters[0].Value"] AS Category,
           message["Filters[1].Value"] AS AddressType
    FROM @keyvalues;

1 个答案:

答案 0 :(得分:0)

尽管这实际上并不能回答我的问题,但是,作为一种解决方法,我修改了Microsoft“示例” JsonFunctions.JsonTuple方法,以便能够指定我自己的键名来提取值:

    ///   Added - Prefix a path expression with a specified key.  Use key~$e in the expression.
    ///   eg:
    ///   JsonTuple(json, "myId~id", "myName~name")    -> field names          MAP{ {myId, 1 }, {myName, Ed } }

修改后的代码:

    private static IEnumerable<KeyValuePair<string, T>> ApplyPath<T>(JToken root, string path)
    {
        var keySeparatorPos = path.IndexOf("~");
        string key = null;
        var searchPath = path;
        if (keySeparatorPos > 0) // =0?if just a leading "=", i.e. no key provided, then don't parse out a key.
        {
            key = path.Substring(0, keySeparatorPos).Trim();
            searchPath = path.Substring(keySeparatorPos + 1);
        }

        // Children
        var children = SelectChildren<T>(root, searchPath);
        foreach (var token in children)
        {
            // Token => T
            var value = (T)JsonFunctions.ConvertToken(token, typeof(T));

            // Tuple(path, value)
            yield return new KeyValuePair<string, T>(key ?? token.Path, value);
        }
    }

例如,我可以访问价目表并将其命名为

@keyvalues =
    SELECT JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body), 
        "TimeStamp",
        "EventName",
        "Plan~          $.UrlParams.plan",
        "Category~      $.Filters[?(@.Field == 'Category')].Value",
        "AddressType~   $.Filters[?(@.Field == 'AddressType')].Value"
        ) AS message
    FROM @rs;

@results =
    SELECT 
       message["TimeStamp"]     AS TimeStamp,
       message["EventName"]     AS EventName,
       message["Plan"]          AS Plan,
       message["Category"]      AS Category,
       message["AddressType"]   AS AddressType
    FROM @keyvalues;

(我尚未测试过如果同一字段在数组中多次出现会发生什么情况;在我的情况下不会发生这种情况)