将JSON解析为U-SQL,然后转换为csv

时间:2018-02-02 01:39:54

标签: json azure-data-factory azure-data-lake u-sql

我正在尝试将一些JSON格式的遥测数据转换为CSV格式,然后使用U-SQL将其写入文件。

问题是某些JSON键值中包含句点,因此当我执行SELECT操作时,U-SQL无法识别它们。当我检查输出文件时,我看到的只是“p1”的值。如何在脚本中表示JSON密钥名称的名称,以便识别它们。在此先感谢您的帮助!

代码:

REFERENCE ASSEMBLY MATSDevDB.[Newtonsoft.Json];
REFERENCE ASSEMBLY MATSDevDB.[Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;


@jsonDocuments = 
    EXTRACT jsonString string 
    FROM @"adl://xxxx.azuredatalakestore.net/xxxx/{*}/{*}/{*}/telemetry_{*}.json" 
    USING Extractors.Tsv(quoting:false);

@jsonify = 
    SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS json 
    FROM @jsonDocuments;

@columnized = SELECT 
            json["EventInfo.Source"] AS EventInfoSource,
            json["EventInfo.InitId"] AS EventInfoInitId,
            json["EventInfo.Sequence"] AS EventInfoSequence,
            json["EventInfo.Name"] AS EventInfoName,
            json["EventInfo.Time"] AS EventInfoTime,
            json["EventInfo.SdkVersion"] AS EventInfoSdkVersion,
            json["AppInfo.Language"] AS AppInfoLanguage,
            json["UserInfo.Language"] AS UserInfoLanguage,
            json["DeviceInfo.BrowserName"] AS DeviceInfoBrowswerName,
            json["DeviceInfo.BrowserVersion"] AS BrowswerVersion,
            json["DeviceInfo.OsName"] AS DeviceInfoOsName,
            json["DeviceInfo.OsVersion"] AS DeviceInfoOsVersion,
            json["DeviceInfo.Id"] AS DeviceInfoId,
            json["p1"] AS p1,
            json["PipelineInfo.AccountId"] AS PipelineInfoAccountId, 
            json["PipelineInfo.IngestionTime"] AS PipelineInfoIngestionTime, 
            json["PipelineInfo.ClientIp"] AS PipelineInfoClientIp,
            json["PipelineInfo.ClientCountry"] AS PipelineInfoClientCountry,
            json["PipelineInfo.IngestionPath"] AS PipelineInfoIngestionPath,
            json["AppInfo.Id"] AS AppInfoId,
            json["EventInfo.Id"] AS EventInfoId,
            json["EventInfo.BaseType"] AS EventInfoBaseType,
            json["EventINfo.IngestionTime"] AS EventINfoIngestionTime
    FROM @jsonify;

OUTPUT @columnized
TO "adl://xxxx.azuredatalakestore.net/poc/TestResult.csv"
USING Outputters.Csv(quoting : false);

JSON:

{ “EventInfo.Source”: “JS_default_source”, “EventInfo.Sequence”: “1”, “EventInfo.Name”: “daysofweek”, “EventInfo.Time”:“2018-01-25T21:09:36.779 Z “ ”EventInfo.SdkVersion“: ”ACT的Web-JS-2.6.0“, ”AppInfo.Language“: ”恩“, ”UserInfo.Language“: ”EN-US“, ”UserInfo.TimeZone“:” -08:00“,”DeviceInfo.BrowserName“:”Chrome“,”DeviceInfo.BrowserVersion“:”63.0.3239.132“,”DeviceInfo.OsName“:”Mac OS X“,”DeviceInfo.OsVersion“:”10“, “P1”: “V1”, “PipelineInfo.IngestionTime”: “2018-01-25T21:09:33.9930000Z”, “PipelineInfo.ClientCountry”: “CA”, “PipelineInfo.IngestionPath”: “快速路径”,“EventInfo。碱基类型 “:” 定制”, “EventInfo.IngestionTime”: “2018-01-25T21:09:33.9930000Z”}

1 个答案:

答案 0 :(得分:3)

我使用单引号和单方括号,例如

@columnized = SELECT 
            json["['EventInfo.Source']"] AS EventInfoSource,
...

完整代码:

@columnized = SELECT 
            json["['EventInfo.Source']"] AS EventInfoSource,
            json["['EventInfo.InitId']"] AS EventInfoInitId,
            json["['EventInfo.Sequence']"] AS EventInfoSequence,
            json["['EventInfo.Name']"] AS EventInfoName,
            json["['EventInfo.Time']"] AS EventInfoTime,
            json["['EventInfo.SdkVersion']"] AS EventInfoSdkVersion,
            json["['AppInfo.Language']"] AS AppInfoLanguage,
            json["['UserInfo.Language']"] AS UserInfoLanguage,
            json["['DeviceInfo.BrowserName']"] AS DeviceInfoBrowswerName,
            json["['DeviceInfo.BrowserVersion']"] AS BrowswerVersion,
            json["['DeviceInfo.OsName']"] AS DeviceInfoOsName,
            json["['DeviceInfo.OsVersion']"] AS DeviceInfoOsVersion,
            json["['DeviceInfo.Id']"] AS DeviceInfoId,
            json["p1"] AS p1,
            json["['PipelineInfo.AccountId']"] AS PipelineInfoAccountId, 
            json["['PipelineInfo.IngestionTime']"] AS PipelineInfoIngestionTime, 
            json["['PipelineInfo.ClientIp']"] AS PipelineInfoClientIp,
            json["['PipelineInfo.ClientCountry']"] AS PipelineInfoClientCountry,
            json["['PipelineInfo.IngestionPath']"] AS PipelineInfoIngestionPath,
            json["['AppInfo.Id']"] AS AppInfoId,
            json["['EventInfo.Id']"] AS EventInfoId,
            json["['EventInfo.BaseType']"] AS EventInfoBaseType,
            json["['EventINfo.IngestionTime']"] AS EventINfoIngestionTime
    FROM @jsonify;

我的结果:

Results