如何使用USQL将JSON展平为CSV

时间:2017-05-16 01:13:53

标签: json csv azure azure-data-lake u-sql

我可以使用Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple获取一些数据,但我在整理整个文件时遇到了麻烦。

以下是我使用的文件格式:

{
 "SourceUrl":"http://www.unittest.org/test.html",
 "Title":"Unit Test File",
 "Guest":"Unit Test Guest",
 "PublishDate":"2017-05-15T00:00:00",
 "TranscriptionSections":[  
    {  
     "SectionStartTime":"00:00:03",
     "Sentences":[  
        {  
           "Text":"Intro."
        },
        {  
           "Text":"Sentence one"
        },
        {  
           "Text":"Sentence two"
        }
     ]
  },
  {  
     "SectionStartTime":"00:04:46",
     "Sentences":[  
        {  
           "Text":"Sentence three"
        },
        {  
           "Text":"Sentence four"
        }
     ]
  }
 ],
 "Categories":null
}

我想要获得的是每个文本(其中5个)的行,包括它的' SectionStartTime'和所有顶级属性(' PublishDate',' Guest' ...)。

到目前为止,我每隔一段时间就可以得到一条分段时间'用这个:

USE econosphere;

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @in string="adl://abc.azuredatalakestore.net/data/20170515UnitTest.json";

DECLARE @out 
string="adl://abc.azuredatalakestore.net/processed/20170515UnitTest.csv";

@ep = EXTRACT
Title string,
SourceUrl string,
Guest string,
PublishDate DateTime,
TranscriptionSections string
FROM @in
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@epAndTransctripts =
    SELECT Title,
        SourceUrl,
        Guest,
        PublishDate,
        Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(TranscriptionSections).Values AS TranscriptionSections_arr
    FROM @ep;

@all =
    SELECT
        Title,
        SourceUrl,
        Guest,
        PublishDate,
        Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(sects)["SectionStartTime"] AS TranscriptionSectionTimes

    FROM @epAndTransctripts
    CROSS APPLY
        EXPLODE(TranscriptionSections_arr) AS t(sects);


OUTPUT @all
TO @out 
USING Outputters.Csv();

1 个答案:

答案 0 :(得分:1)

以下是适合我的解决方案:

DECLARE @input string = "/input/data.json";

REFERENCE ASSEMBLY JSONBlog.[Newtonsoft.Json];
REFERENCE ASSEMBLY JSONBlog.[Microsoft.Analytics.Samples.Formats];

USING Microsoft.Analytics.Samples.Formats.Json;

@data =
EXTRACT SourceUrl string,
        Title string,
        Guest string,
        PublishDate DateTime,
        TranscriptionSections string,
        Categories string
FROM @input
USING new JsonExtractor();

@data =
SELECT SourceUrl,
       Title,
       Guest,
       PublishDate,
       Categories,
       JsonFunctions.JsonTuple(transcription_section) AS ts_map
FROM @data
 CROSS APPLY
     EXPLODE(JsonFunctions.JsonTuple(TranscriptionSections).Values) AS T(transcription_section);

@data =
SELECT SourceUrl,
       Title,
       Guest,
       PublishDate,
       Categories,
       ts_map["SectionStartTime"]AS SectionStartTime,
       JsonFunctions.JsonTuple(text_item) ["Text"]AS text
FROM @data
      CROSS APPLY
         EXPLODE(JsonFunctions.JsonTuple(ts_map["Sentences"]).Values) AS S(text_item);

OUTPUT @data
TO "/output/jsondata.csv"
USING Outputters.Csv(outputHeader : true);