如何阅读猪的json数据?

时间:2016-09-23 14:17:23

标签: hadoop hive apache-pig hadoop2

我有以下类型的json文件:

{"employees":[
    {"firstName":"John", "lastName":"Doe"},
    {"firstName":"Anna", "lastName":"Smith"},
    {"firstName":"Peter", "lastName":"Jones"}
]}

我正在尝试执行以下pig脚本来加载json数据

A = load 'pigdemo/employeejson.json' using JsonLoader ('employees:{(firstName:chararray)},{(lastName:chararray)}');

得到错误!!

  

无法从支持的错误重新创建异常:错误:   org.codehaus.jackson.JsonParseException:意外的输入结束:   ARRAY的预期关闭标记(来源:来源:   java.io.ByteArrayInputStream@1553f9b2; line:1,column:1])at   [来源:java.io.ByteArrayInputStream@1553f9b2; line:1,column:29]

1 个答案:

答案 0 :(得分:1)

首先,您看到Unexpected end-of-input的原因是因为每个重新编码应该在一行中 - 如下所示:

{"employees":[{"firstName":"John", "lastName":"Doe"}, {"firstName":"Anna", "lastName":"Smith"}, {"firstName":"Peter", "lastName":"Jones"}]}

现在 - 由于每一行都是员工列表,请运行下一个命令

A = load '$flurryData' using JsonLoader ('employees:bag {t:tuple(firstName:chararray, lastName:chararray)}');
describe A;
dump A;

提供下一个输出

A: {employees: {t: (firstName: chararray,lastName: chararray)}}

({(John,Doe),(Anna,Smith),(Peter,Jones)})

希望这有帮助!