Question

我有一个要求，我们需要使用avrostorage自定义我们在猪中加载文件的方式：

例如，我有一个包含以下架构的avro文件：

{"namespace": "avroColorCount",
 "type": "record",
 "name": "User2",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "content", "type" :  "bytes" }
 ]
}

现在，如果我使用下面的命令，它的工作正常：

x = load 'sample.avro' USING AvroStorage() AS (name: chararray, content: bytearray);

但是，如果我只想加载＆＃39;内容＆＃39;（第二列），我该怎么办呢？

如果我给，

x = load 'sample.avro' USING AvroStorage() AS (content: bytearray);

它给了我错误：

ERROR 1031: Incompatable schema: left is "content:bytearray", right is "name: chararray, content: bytearray"

我知道这可以通过FILTER完成。

但我们的要求是只需一步就能获得第二列。

这可能吗？

提前致谢...

Answer 1

以下代码解决了它..

x = LOAD 'sample.avro' USING AvroStorage('{"type":"record","name":"User2","fields":[{"name":"content","type":"bytearray"}]}');

pig-avro：如何自定义方式，avrostorage加载文件

1 个答案: