Question

我在执行外连接时遇到“不是SequenceFile错误”。它曾经在相同的设置和工作下工作。类似的表，但现在我不知道发生了什么变化，以至于我在大型密钥空间上加入相当大的表时会遇到这个错误。

我正在使用YARN运行Hive 0.13.1 Cloudera 5.3.0。这两个表都是STORED AS orc tblproperties（“orc.compress”=“SNAPPY”）。

存储信息：

SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:  No

此任务的诊断消息：

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
hdfs://my_cluster:9000/user/hive/warehouse/my_table/000000_0 not a
SequenceFile at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation.java:1642) at
org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: hdfs://my_cluster:9000/user/hive/warehouse/my_table
/000000_0 not a SequenceFile at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first
(RowContainer.java:237) at org.apache.hadoop.hive.
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 35  Reduce: 1   Cumulative CPU: 2742.67 sec   HDFS
Read: 8762733372 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 45 minutes 42 seconds 670 msec

在我的.hiverc

中

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=10000;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.created.files=150000;
set hive.error.on.empty.partition=true;
set hive.cli.print.header=true;
set hive.optimize.s3.query=true;
set hive.auto.convert.join=true;
set mapred.child.java.opts=-Xmx2048m;
set hive.error.on.empty.partition=false;
set hive.hadoop.supports.splittable.combineinputformat=true;
set hive.enforce.bucketing=true;
set hive.optimize.bucketmapjoin=true;
set hive.mapjoin.smalltable.filesize=50000000;
set hive.resultset.use.unique.column.names=false;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;

我尝试将两个表声明为sequencefile，但是对于全尺寸表有一个不同的错误，但不是一个小样本：IndexOutOfBound。

Metastore是MySQL。

Hive / Hadoop设置的完整列表很长，但我会查找它 - 只是不知道要查找什么。

如果这与IO或HDFS损坏有关，我该如何检查HDFS健康状况？

以ORC“SNAPPY”格式

0 个答案: