我正在提取具有我需要从中获取值的JSON字段的AVRO数据。 JSON有一个数组,但我不知道该数组的不同元素可能以什么顺序出现。如何定位特定的节点/值?
例如,Filters [0]可能一次是Category,但是另一次可能是AddressType。
我正在提取AVRO数据-即
@rs =
EXTRACT date DateTime,
Body byte[]
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
...
Body是可以如下所示的JSON(但是Category并不总是Filter [0]。这是一个小例子;有7种不同类型的“字段”):
{
""TimeStamp"": ""2019-02-19T15:00:29.1067771-05:00"",
""Filters"": [{
""Operator"": ""eq"",
""Field"": ""Category"",
""Value"": ""Sale""
}, {
""Operator"": ""eq"",
""Field"": ""AddressType"",
""Value"": ""Home""
}
]
}
我的U-SQL看起来像这样,但这并不总是有效。
@keyvalues =
SELECT JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body),
"TimeStamp",
"$.Filters[?(@.Field == 'Category')].Value",
"$.Filters[?(@.Field == 'AddressType')].Value"
) AS message
FROM @rs;
@results =
SELECT
message["TimeStamp"] AS TimeStamp,
message["Filters[0].Value"] AS Category,
message["Filters[1].Value"] AS AddressType
FROM @keyvalues;
答案 0 :(得分:0)
尽管这实际上并不能回答我的问题,但是,作为一种解决方法,我修改了Microsoft“示例” JsonFunctions.JsonTuple方法,以便能够指定我自己的键名来提取值:
/// Added - Prefix a path expression with a specified key. Use key~$e in the expression.
/// eg:
/// JsonTuple(json, "myId~id", "myName~name") -> field names MAP{ {myId, 1 }, {myName, Ed } }
修改后的代码:
private static IEnumerable<KeyValuePair<string, T>> ApplyPath<T>(JToken root, string path)
{
var keySeparatorPos = path.IndexOf("~");
string key = null;
var searchPath = path;
if (keySeparatorPos > 0) // =0?if just a leading "=", i.e. no key provided, then don't parse out a key.
{
key = path.Substring(0, keySeparatorPos).Trim();
searchPath = path.Substring(keySeparatorPos + 1);
}
// Children
var children = SelectChildren<T>(root, searchPath);
foreach (var token in children)
{
// Token => T
var value = (T)JsonFunctions.ConvertToken(token, typeof(T));
// Tuple(path, value)
yield return new KeyValuePair<string, T>(key ?? token.Path, value);
}
}
例如,我可以访问价目表并将其命名为
@keyvalues =
SELECT JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body),
"TimeStamp",
"EventName",
"Plan~ $.UrlParams.plan",
"Category~ $.Filters[?(@.Field == 'Category')].Value",
"AddressType~ $.Filters[?(@.Field == 'AddressType')].Value"
) AS message
FROM @rs;
@results =
SELECT
message["TimeStamp"] AS TimeStamp,
message["EventName"] AS EventName,
message["Plan"] AS Plan,
message["Category"] AS Category,
message["AddressType"] AS AddressType
FROM @keyvalues;
(我尚未测试过如果同一字段在数组中多次出现会发生什么情况;在我的情况下不会发生这种情况)