我每行都有一个格式化为JSON数组的文件。
像
这样的东西["6400000000",{"status":"FINE","ok":"false","addresses":"00:00:00:00:00:00"}]
["4900000000",{"status":"FINE","ok":"true","addresses":"00:00:00:00:00:00"}]
我在Amazon EMR上运行以下内容:
register 's3://mybucket/jar/elephant-bird-core-4.9.jar';
register 's3://mybucket/jar/elephant-bird-pig-4.9.jar';
register 's3://mybucket/jar/elephant-bird-hadoop-compat-4.9.jar';
register 's3://mybucket/jar/json-simple-1.1.jar';
sample = load 's3://mybucket/data/sample.json' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (json:map[]);
dump sample;
我在JSON中为每一行收到以下错误:
java.lang.ClassCastException: org.json.simple.JSONArray cannot be cast to org.json.simple.JSONObject
at com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:158)
at com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:129)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:151)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
我错过了什么吗?