Question

目前我处理redshift中的大数据。我想问你一个最好的表格架构。

表格上的数据是这样的。

tmp_tbl_yyyy_MM
user_id int,
position_x int,
position_y int,
date date,
type int
sortkey (date)

每月插入1个bilion数据。我经常扫描过去3个月的数据，并且根据{{3}}，时间分隔表似乎很好。因此，我逐月将表格分开，例如＆＃34; _yyyy_MM＆＃34;

以下是我经常运行的示例查询。

select user_id from(
select * from tmp_tbl_yyyy_MM
unionall
select * from tmp_tbl_yyyy_MM
) 
where
(position_x between ? and ?
and position_y between ? and ?)
or
(position_x between ? and ?
and position_y between ? and ?)
or ...
and date between ? and ?
and type = ?;

position_x，position_y条件重复超过1000次。

此查询的计划是顺序扫描，因此它非常慢。教我获得相同结果的最佳方法。

我猜这些点是table，query和sortkey。

Unionall很糟糕？难道我不能逐月分开桌子吗？
Where子句应该在子查询中？
我应该将interleaved sortkey设置为所有的contdition，例如position_x，position_y，date，type

RedShift调整到时间分离表

0 个答案: