我将数据存储为文件中每行的JSON对象。在U-SQL脚本中提取它的好方法是什么?
我已经使用Text Extractor工作了(参见下面的代码)但是JSON对象变大了,我遇到了字符串的128KB大小限制。任何帮助将不胜感激。
示例数据:
{ "prop1": "abc", "prop2": "xyz" }
{ "prop1": "def", "prop2": "uvw" }
U型SQL:
//Read (JSON Lines) line by line
@dataAsStrings =
EXTRACT jsonObjStr string
FROM @INPUT_FILE
USING Extractors.Text(delimiter:'\n');
//Use the JsonTuple function to get the Json Token of the string so it can be parsed later with Json .NET functions
@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonObjStr, "prop1", "prop2") AS rec FROM @dataAsStrings;
//Extract the fields from the Json object.
@json = SELECT
rec["prop1"] AS prop1,
rec["prop2"] AS prop2
FROM @jsonify;
答案 0 :(得分:0)
你应该编写自己的“混合”提取器,它将面向行的提取与JSON提取处理相结合。
答案 1 :(得分:-1)
我知道答案是1年零4个月,但我希望这可以帮助其他用户。
尝试使用以下查询:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
@trial2 =
EXTRACT jsonString string FROM @"/<your-json-file>.json" USING Extractors.Tsv(quoting:false);
@cleanUp = SELECT jsonString FROM @trial2 WHERE (!jsonString.Contains("Part: h" ) AND jsonString!= "465}");
@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS props FROM @cleanUp;
@columnized = SELECT
props["prop1"] AS prop1,
props["prop2"] AS prop2
FROM @jsonify;
OUTPUT @columnized
TO @"/out.csv"
USING Outputters.Csv();
您可以查看this page了解详情。