使用JsonSerDe

时间:2017-03-03 07:14:18

标签: amazon-web-services hive emr amazon-emr

我正在尝试从S3导入JSON数据,在进行一些查询后,再次将输出作为JSON格式导出到S3。但是,我得到了“org.apache.hadoop.hive.serde2.SerDeException:java.io.IOException:在EMR集群上的hive步骤中找不到预期的”错误的开始令牌。为了理解问题所在,我简化了Hive脚本和JSON数据,但它不断给出相同的错误。我该如何解决这个问题?

群集配置:

  

发布:emr-5.3.1

     

Hive版本:2.1.1

     

Hadoop发布:亚马逊2.7.3

     

服务角色:EMR_DefaultRole

     

MasterInstanceType:m4.large

简化的JSON数据的内容:

[{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]

Hive脚本:

DROP TABLE IF EXISTS SOURCE;
DROP TABLE IF EXISTS DESTINATION;

CREATE EXTERNAL TABLE SOURCE(MyID STRING, MyField STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://myPath/subPath/';

CREATE EXTERNAL TABLE DESTINATION(MyID STRING, MyField STRING)                                    
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://anotherPath/subPath/';

INSERT OVERWRITE TABLE DESTINATION SELECT MyID, MyField FROM SOURCE;

这是堆栈跟踪:

  

顶点失败,vertexName = Map 4,vertexId = vertex_1278452616863_0001_1_00,diagnostics = [任务失败,taskId = task_1278452616863,diagnostics = [TaskAttempt 0 failed,info = [错误:运行任务时出错(失败):attempt_1278452616863:java.lang .RuntimeException:java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:Hive运行时错误处理可写[{“MyID”:“FOO123”,“MyField”:“FOO”},{“身份识别码 “:” BAR123" , “MyField的”: “BAR”}]       在org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)       在org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)       在org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)       at org.apache.tez.runtime.task.TaskRunner2Callable $ 1.run(TaskRunner2Callable.java:73)       at org.apache.tez.runtime.task.TaskRunner2Callable $ 1.run(TaskRunner2Callable.java:61)       at java.security.AccessController.doPrivileged(Native Method)       在javax.security.auth.Subject.doAs(Subject.java:422)       在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)       at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)       at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)       在org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)       at java.util.concurrent.FutureTask.run(FutureTask.java:266)       在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)       at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)       在java.lang.Thread.run(Thread.java:745)   引起:java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:Hive运行时错误处理可写[{“MyID”:“FOO123”,“MyField”:“FOO”},{“身份识别码 “:” BAR123" , “MyField的”: “BAR”}]       在org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:95)       在org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70)       在org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:383)       在org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)       ......还有14个   引起:org.apache.hadoop.hive.ql.metadata.HiveException:Hive运行时错误处理可写[{“MyID”:“FOO123”,“MyField”:“FOO”},{“MyID”:“BAR123” “MyField的”: “BAR”}]       在org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)       在org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)       ......还有17个   引起:org.apache.hadoop.hive.serde2.SerDeException:java.io.IOException:找不到预期的启动令牌       在org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:183)       在org.apache.hadoop.hive.ql.exec.MapOperator $ MapOpCtx.readRow(MapOperator.java:128)       在org.apache.hadoop.hive.ql.exec.MapOperator $ MapOpCtx.access $ 200(MapOperator.java:92)       在org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:488)       ......还有18个   引起:java.io.IOException:找不到预期的启动令牌       在org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:169)       ......还有21个

感谢。

2 个答案:

答案 0 :(得分:2)

JSON应该以{{1​​}}开头而不是数组({

答案 1 :(得分:-1)

我尝试使用此方法更新了我的JSON文件,其结构为

{"MyID":"FOO123","MyField":"FOO"},
{"MyID":"BAR123","MyField":"BAR"}

但完成后,我注意到只有第一个对象被插入到表中。