如何在U-SQL中将每个JSON数组元素显示为一行中的逗号分隔元素,而不是每行一个元素?
例如,JSON文件是:
{
"A": {
"A1": "1",
"A2": 0
},
"B": {
"B1": "1",
"B2": 0
},
"C": {
"C1": [
{
"D1": "1"
},
{
"D2": "2"
},
{
"D3": "3"
},
{
"D4": "4"
},
{
"D5": "5"
},
{
"D6": "6"
},
{
"D7": "7"
}
]
}
}
处理数组C1的这个片段的代码如下:
@sql = SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C)["C1"] AS C1_array
FROM @json;
OUTPUT @sql TO "test.txt" USING Outputters.Csv(quoting: false);
@sql2 = SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C1_array) AS C1
FROM @sql
CROSS APPLY
EXPLODE (Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C1_array).Values) AS D(C1);
@result =
SELECT C1["D1"]AS D1,
C1["D2"] AS D2,
C1["D3"]AS D3,
C1["D4"]AS D4,
C1["D5"]AS D5,
C1["D6"]AS D6,
C1["D7"]AS D7,
FROM @sql2;
OUTPUT @result TO "output.txt" USING Outputters.Text();
结果是所有数组元素每行打印一个,即所有D1到D7元素都在不同的行上。我希望D1到D7元素属于同一行,因为它是JSON对象的一部分。
那是:
1,2,3,4,5,6,7
如何做到这一点?
答案 0 :(得分:0)
重要的是,C1
数组每个D
i包含一个项目。因此,如果您将其视为每行的项目,您将获得单独的行。在这种情况下,您希望所有C1
都有一行。
以下是以两种方式做到这一点:有一次你知道什么是Ds,有一次你不知道并且仍然希望它们在一行中(现在都在一个单元格中)。
REFERENCE ASSEMBLY JSONBlog.[Newtonsoft.Json];
REFERENCE ASSEMBLY JSONBlog.[Microsoft.Analytics.Samples.Formats];
// Get one row per C and get the C1 array as column
@d = EXTRACT C1 string FROM "/Temp/ABCD.txt" USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("C");
// Keep one row per C and get all the items from within the C1 array
@d =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C1, "[*].*") AS DMap
FROM @d;
// Get individual items
@d1 =
SELECT
DMap["[0].D1"] AS D1,
DMap["[1].D2"] AS D2,
DMap["[2].D3"] AS D3,
DMap["[3].D4"] AS D4,
DMap["[4].D5"] AS D5,
DMap["[5].D6"] AS D6,
DMap["[6].D7"] AS D7
FROM @d;
// Keep it generic and get all item in a single column
@d2 =
SELECT String.Join("\t", DMap.Values) AS Ds
FROM @d;
OUTPUT @d1
TO "/Temp/D-Out1.tsv"
USING Outputters.Tsv();
OUTPUT @d2
TO "/Temp/D-Out2.tsv"
USING Outputters.Tsv(quoting:false);
如您所见,JsonTuple
函数可以采用JSONPath表达式,然后将结果映射中的所有找到的路径用作键。