选择表中每行的上一个日期

时间:2012-01-13 19:23:58

标签: sql sql-server select

我有一张表跟踪在任意时间点发生的机器维护。这是一个简化的表结构:

Maintenance Table
----------------------------------------
ID            - integer
DateCompleted - date
MachineName   - varchar

这里是一些示例表数据:

ID   DateCompleted MachineName
----------------------------------------
1     1/6/2011     'Machine 1'
2     1/13/2011    'Machine 2'
3     1/14/2011    'Machine 1'
4     2/2/2011     'Machine 3'
5     2/26/2011    'Machine 1'
6     3/9/2011     'Machine 2'
7     4/20/2011    'Machine 3'

我要做的是创建一个查询,该查询将返回每个任务的上一个维护任务的日期。所以结果集是这样的:

ID   MachineName  CurDate     PrevDate
----------------------------------------
1    'Machine 1'    1/6/2011    NULL
2    'Machine 2'    1/13/2011   NULL
3    'Machine 1'    1/14/2011   1/6/2011
4    'Machine 3'    2/2/2011    NULL
5    'Machine 1'    2/26/2011   1/14/2011
6    'Machine 2'    3/9/2011    1/13/2011
7    'Machine 3'    4/20/2011   2/2/2011

编写此类查询的最佳方法是什么?到目前为止我唯一的想法是这样的:

SELECT ID, MachineName, DateCompleted AS CurDate,
  (
    SELECT TOP 1 DateCompleted FROM Maintenance m2
    WHERE m1.MachineName = m2.MachineName
      AND m1.DateCompleted > m2.DateCompleted
    ORDER BY DateCompleted DESC
  ) AS PrevDate

FROM Maintenance m1

ORDER BY ID

非常欢迎任何想法,建议或更正。

7 个答案:

答案 0 :(得分:1)

这个怎么样:

SELECT
    m.ID, m.MachineName, m.DateCompleted AS CurDate, MAX(m_past.DateCompleted) AS PrevDate

FROM Maintenance m

    LEFT JOIN Maintenance m_past
    ON m.MachineName = m_past.MachineName

WHERE m_past.DateCompleted < m.DateCompleted

GROUP BY m.ID

答案 1 :(得分:1)

试试这个:

SELECT A.Id, A.MachineName, A.DateCompleted [CurDate], B.DateCompleted PrevDate
FROM Maintenance  A
OUTER APPLY (SELECT TOP 1 *
             FROM Maintenance 
             WHERE MachineName = A.MachineName AND DateCompleted < A.DateCompleted
             ORDER BY DateCompleted DESC) B

答案 2 :(得分:1)

TOP n是否有效取决于你的dbms。 MAX()将跨平台工作。索引DateCompleted和MachineName;它们都在WHERE子句中使用。

select m1.id, m1.machinename, m1.datecompleted as curdate,
  ( select max(datecompleted)
    from maintenance
    where machinename = m1.machinename
      and datecompleted < m1.datecompleted ) as prevdate
from maintenance m1
order by machinename, curdate

如果dbms支持窗口函数,则可以使用

select m1.id, m1.machinename, m1.datecompleted as curdate, 
       max(datecompleted) over (partition by machinename 
                                order by m1.datecompleted 
                                rows between unbounded preceding 
                                         and 1 preceding) as prevdate
from maintenance m1

我不会猜测哪个会更快。我将加载一个包含您期望的样本数据的表,并测试它们。然后我用10倍的数据重新加载它并再次测试。

在测试过程中,您想了解如何generate and read an execution plan

答案 3 :(得分:1)

解决方案:

declare @tmp table (Id int, DateCompleted datetime, MachineName varchar(100))
insert into @tmp
select 1,'1/6/2011','Machine 1'
union select 2,'1/13/2011',    'Machine 2'
union select 3,'1/14/2011',    'Machine 1'
union select 4,'2/2/2011',     'Machine 3'
union select 5,'2/26/2011',    'Machine 1'
union select 6,'3/9/2011',     'Machine 2'
union select 7,'4/20/2011',    'Machine 3'



select t.Id, t.DateCompleted, t.MachineName, max(t2.DateCompleted) PrevDate
from @tmp t
left join @tmp t2
    on t.MachineName = t2.MachineName
    and t.DateCompleted > t2.DateCompleted
group by t.Id, t.DateCompleted, t.MachineName

答案 4 :(得分:1)

正如你所说“但我欢迎任何”的解决方案。

这是ANSI SQL的解决方案:

SELECT ID, 
       DateCompleted, 
       MachineName, 
       lag(DateCompleted) over (partition by MachineName order by DateCompleted) as PrevDate 
FROM Maintenance 
ORDER BY id;

适用于PostgreSQL,Oracle,DB2和Teradata。

SQL Server尚不支持lag()功能,但即将推出的“Denali”版本(2012)将拥有它。

答案 5 :(得分:1)

从SQL Server 2012开始,您可以使用窗口化聚合来编写所需的查询。只需使用以下代码:

select
    ID, 
    MachineName, 
    DateCompleted AS CurDate,
    min(DateCompleted) 
        over (partition by MachineName order by DateCompleted 
            rows between 1 preceding and 1 preceding) as PrevDate
from Maintenance
order by Id

答案 6 :(得分:0)

您的查询对我来说似乎很合理,而且很容易理解。忽略最终排序的可能成本,我认为复杂性基本上是O(n log n),假设存在适当的索引。对于表中的每个条目,查询引擎必须找到上一个日期条目,该日期条目应为具有正确索引的O(log n)。

可能以代码复杂性为代价提高性能的一种方法是编写存储过程以产生结果。我认为无序结果可以在O(n)中产生。该过程可以遍历由MachineName排序的表格上的两个游标,然后是DateCompleted。当它遍历两个游标时,它可以在O(n)中构造结果集。但是,结果将需要按ID排序,这将是O(n log n)。所以我认为理论上的复杂性与查询相同,但过程可能会减少开销并且运行速度更快。但我肯定不会推荐这种解决方案,因为它会很难看并且难以维护。