U-SQL:使用Avro Extractor,Array不能为null

时间:2017-10-19 14:57:41

标签: azure-data-lake

使用Avro Extractor

时,数组不能为空

使用EventHub并捕获到Blob存储我有一个基于尝试转换文件的AvroSamples的函数。

这是我的U-SQL脚本:

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [log4net];
REFERENCE ASSEMBLY [Avro]; 
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];


DECLARE @ABI_DATE string = "2017/10/17/"; //replace by ADF pipeline
DECLARE @input_file string = "wasb://archive@sa/namespace/eh/{*}/" + @ABI_DATE +"{*}/{*}/{*}";
DECLARE @output_file string = @"/output/" + @ABI_DATE + "extract.csv";


@rs =
EXTRACT
        SequenceNumber long
        ,EnqueuedTimeUtc string
        ,Body byte[]
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
    {
        ""type"":""record"",
        ""name"":""EventData"",
        ""namespace"":""Microsoft.ServiceBus.Messaging"",
        ""fields"":[
            {""name"":""SequenceNumber"",""type"":""long""},
            {""name"":""Offset"",""type"":""string""},
            {""name"":""EnqueuedTimeUtc"",""type"":""string""},
            {""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
            {""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
            {""name"":""Body"",""type"":[""null"",""bytes""]}
        ]
    }
");

@cnt =
SELECT 
    SequenceNumber
    ,Encoding.UTF8.GetString(Body) AS Json   //THIS LINE BREAKS !!!!
    ,EnqueuedTimeUtc
FROM @rs;

OUTPUT @cnt TO @output_file USING Outputters.Text();

如果我运行相同的提取器但注释掉Body字段,则按预期工作。

这是错误:

  

用户表达式的内部异常:Array不能为null。参数   name:bytes当前行转储:SequenceNumber:4622     EnqueuedTimeUtc:NULL正文:NULL

     

评估表达式时出错Encoding.UTF8.GetString(Body)

1 个答案:

答案 0 :(得分:2)

Florian Mander,给了我解释:

  

提取器正常工作,您只是传递空值   (故意,因为它在模式中)在方法中   (Encoding.GetString)不接受null作为输入。在你最新的   解决方案,你将失去所有没有身体的记录。   如果这是好的,这是一个非技术性的决定。

所以这是修复它的方法(使用WHERE子句)

@cnt =
SELECT 
    SequenceNumber
    ,Encoding.UTF8.GetString(Body) AS Json
    ,EnqueuedTimeUtc
FROM @rs
WHERE Body != null;