使用Avro Extractor
时,数组不能为空使用EventHub并捕获到Blob存储我有一个基于尝试转换文件的AvroSamples的函数。
这是我的U-SQL脚本:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [log4net];
REFERENCE ASSEMBLY [Avro];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
DECLARE @ABI_DATE string = "2017/10/17/"; //replace by ADF pipeline
DECLARE @input_file string = "wasb://archive@sa/namespace/eh/{*}/" + @ABI_DATE +"{*}/{*}/{*}";
DECLARE @output_file string = @"/output/" + @ABI_DATE + "extract.csv";
@rs =
EXTRACT
SequenceNumber long
,EnqueuedTimeUtc string
,Body byte[]
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
{
""type"":""record"",
""name"":""EventData"",
""namespace"":""Microsoft.ServiceBus.Messaging"",
""fields"":[
{""name"":""SequenceNumber"",""type"":""long""},
{""name"":""Offset"",""type"":""string""},
{""name"":""EnqueuedTimeUtc"",""type"":""string""},
{""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
{""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
{""name"":""Body"",""type"":[""null"",""bytes""]}
]
}
");
@cnt =
SELECT
SequenceNumber
,Encoding.UTF8.GetString(Body) AS Json //THIS LINE BREAKS !!!!
,EnqueuedTimeUtc
FROM @rs;
OUTPUT @cnt TO @output_file USING Outputters.Text();
如果我运行相同的提取器但注释掉Body字段,则按预期工作。
这是错误:
用户表达式的内部异常:Array不能为null。参数 name:bytes当前行转储:SequenceNumber:4622 EnqueuedTimeUtc:NULL正文:NULL
评估表达式时出错Encoding.UTF8.GetString(Body)
答案 0 :(得分:2)
Florian Mander,给了我解释:
提取器正常工作,您只是传递空值 (故意,因为它在模式中)在方法中 (Encoding.GetString)不接受null作为输入。在你最新的 解决方案,你将失去所有没有身体的记录。 如果这是好的,这是一个非技术性的决定。
所以这是修复它的方法(使用WHERE子句)
@cnt =
SELECT
SequenceNumber
,Encoding.UTF8.GetString(Body) AS Json
,EnqueuedTimeUtc
FROM @rs
WHERE Body != null;