Question

用这个创建的表：

create table syslog_staged (id string, facility string, sender string, severity string, tstamp string, service string, msg string) partitioned by (hostname string,  year string, month string, day string) clustered by (id) into 20 buckets stored as orc tblproperties("transactional"="true");

该表填充了Apache nifi的PutHiveStreaming ......

alter table syslog_staged partition (hostname="cloudserver19", year="2016", month="10", day="24") compact 'major';

现在事实证明压缩因某种原因而失败.....（来自工作经历）

No of maps and reduces are 0 job_1476884195505_0031
Job commit failed: java.io.FileNotFoundException: File hdfs://hadoop1.openstacksetup.com:8020/apps/hive/warehouse/log.db/syslog_staged/hostname=cloudserver19/year=2016/month=10/day=24/_tmp_27c40005-658e-48c1-90f7-2acaa124e2fa does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:113)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:966)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:962)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:962)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(CompactorMR.java:776)
at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

来自hive Metastore log：

2016-10-24 16:33:35,503 WARN  [Thread-14]: compactor.Initiator (Initiator.java:run(132)) - Will not initiate compaction for log.syslog_staged.hostname=cloudserver19/year=2016/month=10/day=24 since last hive.compactor.initiator.failed.compacts.threshold attempts to compact it failed.

Answer 1

请设置以下属性以优化事务表的压缩 -

set hive.compactor.worker.threads=1; set hive.compactor.initiator.on=true;

我假设您已设置以下事务性Hive属性

set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.support.concurrency=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.bucketing=true;

hive事务表压缩失败

1 个答案: