我试图从Event Hub Capture生成的AVRO文件中提取数据。在大多数情况下,这完美无瑕。但某些文件导致我出现问题。当我运行以下U-SQL作业时,我收到错误:
USE DATABASE Metrics;
USE SCHEMA dbo;
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
REFERENCE ASSEMBLY [Avro];
REFERENCE ASSEMBLY [log4net];
USING Microsoft.Analytics.Samples.Formats.ApacheAvro;
USING Microsoft.Analytics.Samples.Formats.Json;
USING System.Text;
//DECLARE @input string = "adl://mydatalakestore.azuredatalakestore.net/event-hub-capture/v3/{date:yyyy}/{date:MM}/{date:dd}/{date:HH}/{filename}";
DECLARE @input string = "adl://mydatalakestore.azuredatalakestore.net/event-hub-capture/v3/2018/01/16/19/rcpt-metrics-us-es-eh-metrics-v3-us-0-35-36.avro";
@eventHubArchiveRecords =
EXTRACT Body byte[],
date DateTime,
filename System.String
FROM @input
USING new AvroExtractor(@"
{
""type"":""record"",
""name"":""EventData"",
""namespace"":""Microsoft.ServiceBus.Messaging"",
""fields"":[
{""name"":""SequenceNumber"",""type"":""long""},
{""name"":""Offset"",""type"":""string""},
{""name"":""EnqueuedTimeUtc"",""type"":""string""},
{""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
{""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
{""name"":""Body"",""type"":[""null"",""bytes""]}
]
}
");
@json =
SELECT Encoding.UTF8.GetString(Body) AS json
FROM @eventHubArchiveRecords;
OUTPUT @json
TO "/outputs/Avro/testjson.csv"
USING Outputters.Csv(outputHeader : true, quoting : true);
我收到以下错误:
用户代码未处理的异常:"字典中没有给定的密钥。"
在调用方法' Extract'时,报告了用户代码未处理的异常。在用户类型' Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor'
我是否正确认为问题是在Event Hub Capture生成的AVRO文件中,或者我的代码是否有问题?
答案 0 :(得分:1)
“密钥不存在”错误是指您的extract语句中的字段。找不到数据和文件名字段。我删除了这些字段,并且您的脚本在我的ADLA实例中正确运行。
答案 1 :(得分:0)
当前实现仅支持基本类型,而不支持Avro specification的复杂类型。
答案 2 :(得分:0)
您必须构建和使用基于apache avro的提取器,而不是使用MS提供的示例提取器。 我们走了同样的道路