我正在进行红移 - 我有一张像
这样的表userid oid version number_of_objects
1 ab 1 10
1 ab 2 20
1 ab 3 17
1 ab 4 16
1 ab 5 14
1 cd 1 5
1 cd 2 6
1 cd 3 9
1 cd 4 12
2 ef 1 4
2 ef 2 3
2 gh 1 16
2 gh 2 12
2 gh 3 21
我想从此表中选择每个oid
的最大版本号,并获取userid
和行号。
当我尝试这个时,不幸的是我已经把整张桌子拿回来了:
SELECT MAX(version), oid, userid, number_of_objects
FROM table
GROUP BY oid, userid, number_of_objects
LIMIT 10;
但真正的结果,我正在寻找的是:
userid oid MAX(version) number_of_objects
1 ab 5 14
1 cd 4 12
2 ef 2 3
2 gh 3 21
以某种方式明显不起作用,它说:
不支持SELECT DISTINCT ON
你有什么想法吗?
更新:与此同时,我想出了这个解决方法,但我觉得这不是最聪明的解决方案。它也很慢。但它至少起作用。以防万一:
SELECT * FROM table,
(SELECT MAX(version) as maxversion, oid, userid
FROM table
GROUP BY oid, userid
) as maxtable
WHERE table.oid = maxtable.oid
AND table.userid = maxtable.userid
AND table.version = maxtable.version
LIMIT 100;
你有更好的解决方案吗?
答案 0 :(得分:7)
如果redshift有窗函数,你可以试试这个:
SELECT *
FROM (
select oid,
userid,
version,
max(version) over (partition by oid, userid) as max_version,
from the_table
) t
where version = max_version;
我希望它比使用group by
的自联接更快。
另一种选择是使用row_number()
函数:
SELECT *
FROM (
select oid,
userid,
version,
row_number() over (partition by oid, userid order by version desc) as rn,
from the_table
) t
where rn = 1;
这个问题更多的是个人品味问题。表现明智,我不希望有任何区别。
答案 1 :(得分:0)
select distinct
first_value(userid) over(
partition by oid
order by version desc
rows between unbounded preceding and unbounded following
) as userid
, oid
, first_value(version) over(
partition by oid
order by version desc
rows between unbounded preceding and unbounded following
) as max_version
, first_value(number_of_objects) over(
partition by oid
order by version desc
rows between unbounded preceding and unbounded following
) as number_of_objects
from table
order by oid;
AWS Redshift Documentation first_value
如果nulls last
可为空,请不要忘记顺序中的version
。
答案 2 :(得分:0)
长话短说:骑马。
作者的方法应该在较小的表上更快并且提取示例数据,但是窗口方法在性能上将更加一致,并且在整个表上将更快。
以下是我在桌子上做的一些解释性结果,该结果具有17列,184 121 798行和12 809 740个唯一ID(每个ID平均14个版本,但最多可以有40个版本)。
快速摘要:
Tomi的做法:cost = 5983958.76..67801689853856.94(第一行6 * 10 ^ 6,整个表格7 * 10 ^ 13)
@a_horse_with_no_name方法:cost = 1000027117538.39..1000031720583.59(任何查询10 ^ 12)
@Merlin:与上述方法几乎完全相同。
explain
SELECT * FROM table t,
(SELECT MAX(version) as maxversion, id
FROM table
GROUP BY id
) as maxtable
WHERE t.id = maxtable.id
AND t.version = maxtable.maxversion;
XN Hash Join DS_DIST_NONE (cost=5983958.76..67801689853856.94 rows=63811541 width=590)
Hash Cond: ((("outer".id)::text = ("inner".id)::text) AND ("outer".version = "inner".maxversion))
-> XN Seq Scan on equipment_visits ev (cost=0.00..1841218.08 rows=184121808 width=418)
-> XN Hash (cost=5063349.72..5063349.72 rows=184121808 width=172)
-> XN Subquery Scan maxtable (cost=2761827.12..5063349.72 rows=184121808 width=172)
-> XN HashAggregate (cost=2761827.12..3222131.64 rows=184121808 width=44)
-> XN Seq Scan on equipment_visits (cost=0.00..1841218.08 rows=184121808 width=44)
因此,第一行和所有行的成本分别为5983958.76(6 * 10 ^ 6)和67801689853856.94(7 * 10 ^ 13)。
@a_horse_with_no_name提供的两个解决方案都有几乎完全一样的计划,因此我将仅粘贴其中一个
explain
SELECT *
FROM (
select *,
row_number() over (partition by id order by version desc) as rn
from table
)
where rn = 1;
给予
Filter: (rn = 1)
-> XN Window (cost=1000027117538.39..1000029419060.99 rows=184121808 width=44)
Partition: id
Order: version
-> XN Sort (cost=1000027117538.39..1000027577842.91 rows=184121808 width=44)
Sort Key: id, version
-> XN Seq Scan on table (cost=0.00..1841218.08 rows=184121808 width=44)
@Merlin提供的解决方案似乎不完整,因为它没有返回最新版本的所有值,但其性能与第二种选择相似
explain
select distinct
id
, first_value(version) over(
partition by id
order by version desc
rows between unbounded preceding and unbounded following
) as max_version
, first_value(additional_col) over(
partition by id
order by version desc
rows between unbounded preceding and unbounded following
) as additional_col
from table t;
给予
XN Unique (cost=1000027117538.39..1000032180888.11 rows=184121808 width=84)
-> XN Window (cost=1000027117538.39..1000030799974.55 rows=184121808 width=84)
Partition: id
Order: version
-> XN Sort (cost=1000027117538.39..1000027577842.91 rows=184121808 width=84)
Sort Key: id, version
-> XN Seq Scan on table (cost=0.00..1841218.08 rows=184121808 width=84)