使用Pig加载JSON数组

时间:2015-09-28 21:32:53

标签: json apache-pig elephantbird

我每行都有一个格式化为JSON数组的文件。

这样的东西
["6400000000",{"status":"FINE","ok":"false","addresses":"00:00:00:00:00:00"}]
["4900000000",{"status":"FINE","ok":"true","addresses":"00:00:00:00:00:00"}]

我在Amazon EMR上运行以下内容:

register 's3://mybucket/jar/elephant-bird-core-4.9.jar';
register 's3://mybucket/jar/elephant-bird-pig-4.9.jar';
register 's3://mybucket/jar/elephant-bird-hadoop-compat-4.9.jar';
register 's3://mybucket/jar/json-simple-1.1.jar';

sample = load 's3://mybucket/data/sample.json' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (json:map[]);

dump sample;

我在JSON中为每一行收到以下错误:

java.lang.ClassCastException: org.json.simple.JSONArray cannot be cast to org.json.simple.JSONObject
    at com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:158)
    at com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:129)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:151)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)

我错过了什么吗?

0 个答案:

没有答案