Redshift可以使用子查询的结果按sortkey进行过滤吗?

时间:2017-03-21 21:20:27

标签: amazon-redshift

我在Redshift中有一个表,有几十亿行,看起来像这样

CREATE TABLE channels AS (
 fact_key TEXT NOT NULL distkey
 job_key BIGINT
 channel_key TEXT NOT NULL
)
diststyle key
compound sortkey(job_key, channel_key);

当我通过job_key + channel_key查询时,如果我在查询中使用channel_key的特定值,则完整sortkey会正确限制我的seq扫描。

EXPLAIN
SELECT * FROM channels scd
WHERE scd.job_key = 1 AND scd.channel_key IN ('1234', '1235', '1236', '1237')

XN Seq Scan on channels scd  (cost=0.00..3178474.92 rows=3428929 width=77)
  Filter: ((((channel_key)::text = '1234'::text) OR ((channel_key)::text = '1235'::text) OR ((channel_key)::text = '1236'::text) OR ((channel_key)::text = '1237'::text)) AND (job_key = 1))

但是,如果我使用IN +查询channel_key,则子查询Redshift不使用sortkey。

EXPLAIN
SELECT * FROM channels scd
WHERE scd.job_key = 1 AND scd.channel_key IN (select distinct channel_key from other_channel_list where job_key = 14 order by 1)

XN Hash IN Join DS_DIST_ALL_NONE  (cost=3.75..3540640.36 rows=899781 width=77)
  Hash Cond: (("outer".channel_key)::text = ("inner".channel_key)::text)
  ->  XN Seq Scan on channels scd  (cost=0.00..1765819.40 rows=141265552 width=77)
        Filter: (job_key = 1)
  ->  XN Hash  (cost=3.75..3.75 rows=1 width=402)
        ->  XN Subquery Scan "IN_subquery"  (cost=0.00..3.75 rows=1 width=402)
              ->  XN Unique  (cost=0.00..3.74 rows=1 width=29)
                    ->  XN Seq Scan on other_channel_list  (cost=0.00..3.74 rows=1 width=29)
                          Filter: (job_key = 14)

是否有可能让它发挥作用?我的最终目标是将其转换为视图,因此预先定义我的channel_keys列表将无效。

编辑以提供更多背景信息:

这是较大查询的一部分,此get哈希的结果与其他一些数据相关联。如果我对channel_keys进行硬编码,那么对散列连接的输入大约为200万行。如果我将IN条件与子查询一起使用(没有其他更改)则散列连接的输入是4亿行。总查询时间从大约40秒到15分钟以上。

1 个答案:

答案 0 :(得分:0)

这是否为您提供了比子查询版本更好的计划?

with other_channels as (
    select distinct channel_key from other_channel_list where job_key = 14 order by 1
)
SELECT *
FROM channels scd
JOIN other_channels ocd on scd.channel_key = ocd.channel_key
WHERE scd.job_key = 1