Question

我有列区域和启动时间的数据。我想查询n个最新的数据组，其中group被定义为具有相同区域的记录，而其他区域之间没有按starttime排序。

在这个例子中，n是4.第一组有2个A然后是2个B，然后再单个A然后是3个C.

我有一个正确执行此任务的查询： http://sqlfiddle.com/#!17/ffbee/1 但是，对于大型表，此查询可能效率不高，因为它首先选择所有数据，然后才能获得所需的数据。我知道这可能是使用过程编写的，但我想知道我是否能以声明的方式只使用sql来实现它。

更新

我已经对原始查询，@ Sentinel查询和应用程序解决方案进行了基准测试，一次获取20个结果，并检查是否已达到所需的组数。 n为4.组大小随机在10到20之间。有4个区域。所有方案： Sentinel查询和应用程序解决方案：

Sentinel的查询是最佳的。它具有持续的复杂性，并且比app查询更快。谢谢:)）

基准工具的源代码，如果有人感兴趣的话：https://gitlab.com/virtual92/groups-of-data-timeline-sql-benchmark 图表来源：https://plot.ly/~Vistritium/14/

Answer 1

我无法评论postgresql中此代码的效率，但它确实避免了您在示例中使用的自连接并使用较少的select语句：

with t1 as (
select e.*
     -- Detect the zones leading edges
     , case when zone = lag(zone) over (order by starttime desc)
            then 0 -- Same zone as previous 
            else 1 -- Found a leading edge
       end edge
  from encounter e
), t2 as (
select t1.*
     -- Turn the edges into groups
     , sum(edge) over (order by starttime desc rows between unbounded preceding and current row) grp
  from t1
)
select * from t2
where grp <= 4;

如何有效地查询时间轴中彼此相邻的数据组

1 个答案: