Question

我的查询在减速器步骤中停留在99％，我使用的是hive 1.2。

DB.acct_dim是一个在etl_load_month上有分区的持久表，而stage表stg.acct_dim_incre是一个包含delta行的每日截断表。

目标拥有数十亿行，而舞台每天可以拥有数百万行。

有什么方法可以优化下面的运行速度更快？

如果我指定从目标读取的确切分区，那么它工作正常，但是我想从目标中拉出多个分区。

set hive.exec.parallel=true;
set mapred.compress.map.output=true;
set mapred.output.compress=true;
set hive.exec.compress.output=true;
set hive.exec.parallel=true;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.auto.convert.join=false;

SELECT
A.col
,A.col2
,A.col3
FROM DB.acct_dim A ---BIG table (billion rows)
LEFT OUTER JOIN
stg.acct_dim_incre B --SMALL table (with million rows)
ON (A.etl_load_month = B.etl_load_month
      AND A.acct_key = B.acct_key)
WHERE A.etl_load_month IN ( --partition column
        SELECT etl_load_month
        FROM stg.acct_dim_incre
        )
AND B.acct_key IS NULL
;

Hive查询卡在reducer步骤中

0 个答案: