我在执行外连接时遇到“不是SequenceFile错误”。它曾经在相同的设置和工作下工作。类似的表,但现在我不知道发生了什么变化,以至于我在大型密钥空间上加入相当大的表时会遇到这个错误。
我正在使用YARN运行Hive 0.13.1 Cloudera 5.3.0。 这两个表都是STORED AS orc tblproperties(“orc.compress”=“SNAPPY”)。
存储信息:
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
此任务的诊断消息:
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
hdfs://my_cluster:9000/user/hive/warehouse/my_table/000000_0 not a
SequenceFile at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation.java:1642) at
org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: hdfs://my_cluster:9000/user/hive/warehouse/my_table
/000000_0 not a SequenceFile at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first
(RowContainer.java:237) at org.apache.hadoop.hive.
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 35 Reduce: 1 Cumulative CPU: 2742.67 sec HDFS
Read: 8762733372 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 45 minutes 42 seconds 670 msec
在我的.hiverc
中set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=10000;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.created.files=150000;
set hive.error.on.empty.partition=true;
set hive.cli.print.header=true;
set hive.optimize.s3.query=true;
set hive.auto.convert.join=true;
set mapred.child.java.opts=-Xmx2048m;
set hive.error.on.empty.partition=false;
set hive.hadoop.supports.splittable.combineinputformat=true;
set hive.enforce.bucketing=true;
set hive.optimize.bucketmapjoin=true;
set hive.mapjoin.smalltable.filesize=50000000;
set hive.resultset.use.unique.column.names=false;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;
我尝试将两个表声明为sequencefile,但是对于全尺寸表有一个不同的错误,但不是一个小样本:IndexOutOfBound。
Metastore是MySQL。
Hive / Hadoop设置的完整列表很长,但我会查找它 - 只是不知道要查找什么。
如果这与IO或HDFS损坏有关,我该如何检查HDFS健康状况?