我正在使用hive1.2.1和tez0.7进行测试,但是当我使用acid table进行更新和删除时,会出现一些问题,下面是表结构:
CREATE EXTERNAL TABLE IF NOT EXISTS working.dw_items_w
(
column defination
)
CLUSTERED BY (id) into 5000 buckets
STORED AS ORC
LOCATION '/sys/edw/working/dw_items_w2'
TBLPROPERTIES ("transactional"="true");
和更新查询如下所示:
update working.dw_items_w
set
PROCESS_FLAG =(case when (
(TGT_LSTG_STATUS_ID = 1 and (to_date(SALE_END) - to_date(TGT_AUCT_END_DT) ) <> 0 )
or (TGT_LSTG_STATUS_ID in (1,2) and NEW_LSTG_STATUS_ID in (0,4) )
) then 'D'
when
((TGT_LSTG_STATUS_ID =1 and NEW_LSTG_STATUS_ID = 1 and datediff(to_date(SALE_END) ,to_date(TGT_AUCT_END_DT)
) = 0 )
or (TGT_LSTG_STATUS_ID = 2 and NEW_LSTG_STATUS_ID = 1)) then 'X' else PROCESS_FLAG end ),
NEW_LSTG_STATUS_ID = (case when TGT_LSTG_STATUS_ID = 0 AND NEW_LSTG_STATUS_ID = 0 AND to_date(SALE_END)
< date_sub(to_date( from_unixtime(unix_timestamp(),'yyyy-MM-dd') ), 92)
AND to_date(SALE_END) <> to_date('1969-12-31') then 1 else NEW_LSTG_STATUS_ID end)
where PROCESS_FLAG = 'U';
问题如下:
在 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) 在org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) 在org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) at org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable $ 1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable $ 1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) 在javax.security.auth.Subject.doAs(Subject.java:415) 在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1650) at org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable.callInternal(TezTaskRunner.java:167) 在org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 在java.util.concurrent.FutureTask.run(FutureTask.java:262) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615) 在java.lang.Thread.run(Thread.java:745)引起:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:Hive运行时错误 处理行时(tag = 0) { “键”:{ “reducesinkkey0”:{ “的transactionId”:19, “bucketid”:471, “ROWID”:0}}, “值”:忽略}} 在org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302) 在org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:249) 在org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ......还有14个
答案 0 :(得分:0)
将以下内容添加到hive-site.xml
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
然后确保您在预测时使用bucketing创建ORC表:
如果不存在则创建表foo.tableinfo ( schema_name varchar(32) ,table_name varchar(64) ,department varchar(64) ,乡村varchar(64) ,state varchar(64) ,城市varchar(64) ,粒度int ,varchar(256) ) 由(table_name)聚集到4个桶中 存储为ORC TBLPROPERTIES(&#34; orc.compress&#34; =&#34; ZLIB&#34;,&#39; transactional&#39; =&#39; true&#39;);
然后以下内容将起作用:
从foo.tableinfo中删除table_name =&#39; foo&#39;;