Sql查询返回时间轴的结果集

时间:2014-05-14 22:34:31

标签: sql sql-server group-by

我认为描述我要找的内容的最佳方式是显示一个数据表以及我想从Query中返回的内容。这是SQL Server中的一个简单数据表:

JobNumber TimeOfWeigh 
100       01/01/2014 08:00 
100       01/01/2014 09:00 
100       01/01/2014 10:00 
200       01/01/2014 12:00 
200       01/01/2014 13:00 
300       01/01/2014 15:00 
300       01/01/2014 16:00 
100       02/01/2014 08:00 
100       02/01/2014 09:00 
100       03/01/2014 10:00 

我想要一个查询,它将对作业进行分组并返回每个组的第一个和最后一个DateTime。但是,正如您在这里看到的那样,有100套100个工作号码。我不希望第二组加入第一组。

相反,我想这样:

JobNumber   First Weigh         Last Weigh
100         01/01/2014 08:00    01/01/2014 10:00
200         01/01/2014 12:00    01/01/2014 13:00
300         01/01/2014 15:00    01/01/2014 16:00
100         02/01/2014 08:00    03/01/2014 10:00

我一直在努力奋斗数小时。任何帮助将不胜感激。

EDITED

日期&时代都只是虚拟的随机数据。实际数据在一天内有数千个重量。我希望每个作业的第一个和最后一个重量来确定作业的持续时间,以便我可以在时间线上表示持续时间。但是我希望两次显示Job 100,表示它已暂停并在200&之后恢复完成300个

4 个答案:

答案 0 :(得分:2)

这是我对此的尝试,使用row_number()和分区。我把它分解成了一些步骤,希望能让它变得更容易理解。如果您的表中已经有一个包含整数标识符的列,那么您可以省略第一个CTE。即使在那之后,您也可以进一步简化这一过程,但它似乎确实有效。

(编辑添加一个标志,指示评论中要求的具有多个范围的作业。)

declare @sampleData table (JobNumber int, TimeOfWeigh datetime);
insert into @sampleData values
    (100, '01/01/2014 08:00'),
    (100, '01/01/2014 09:00'), 
    (100, '01/01/2014 10:00'),
    (200, '01/01/2014 12:00'),
    (200, '01/01/2014 13:00'),
    (300, '01/01/2014 15:00'),
    (300, '01/01/2014 16:00'),
    (100, '02/01/2014 08:00'),
    (100, '02/01/2014 09:00'),
    (100, '03/01/2014 10:00');

-- The first CTE assigns an ordering to the records according to TimeOfWeigh,
-- producing the row numbers you gave in your example.
with JobsCTE as
(    
    select 
        row_number() over (order by TimeOfWeigh) as RowNumber, 
        JobNumber,
        TimeOfWeigh
    from @sampleData
),

-- The second CTE orders by the RowNumber we created above, but restarts the
-- ordering every time the JobNumber changes. The difference between RowNumber
-- and this new ordering will be constant within each group.
GroupsCTE as
(
    select
        RowNumber - row_number() over (partition by JobNumber order by RowNumber) as GroupNumber,
        JobNumber,
        TimeOfWeigh
    from JobsCTE
),

-- Join by JobNumber alone to determine which jobs appear multiple times.
DuplicatedJobsCTE as
(
    select JobNumber 
    from GroupsCTE 
    group by JobNumber 
    having count(distinct GroupNumber) > 1
)

-- Finally, we use GroupNumber to get the mins and maxes from contiguous ranges.
select
    G.JobNumber,
    min(G.TimeOfWeigh) as [First Weigh],
    max(G.TimeOfWeigh) as [Last Weigh],
    case when D.JobNumber is null then 0 else 1 end as [Multiple Ranges]
from
    GroupsCTE G
    left join DuplicatedJobsCTE D on G.JobNumber = D.JobNumber
group by
    G.JobNumber,
    G.GroupNumber,
    D.JobNumber
order by
    [First Weigh];

答案 1 :(得分:0)

您必须使用自联接来创建包含每个集合中第一行和最后一行的伪表。

Select F.JobNumber, 
   f.TimeOfWeigh FirstWeigh, 
   l.TimeOfWeigh LastWeigh
From table f -- for first record
   join table l -- for last record
       on l.JobNumber = f.JobNumber 
          And Not exists
              (Select * from table
               Where JobNumber = f.JobNumber 
                  And id = f.id-1)
          And Not exists
              (Select * from table
               Where JobNumber = f.JobNumber 
                  And id = l.id+1)
          And Not Exists
              (Select * from table
               Where JobNumber <> f.JobNumber 
                  And id Between f.Id and l.Id)

答案 2 :(得分:0)

当我看到它时,这个让我着迷,我想知道如何解决这个问题。我太忙了,不能先得到一个答案,我以后才开始工作但是从那以后就已经开了几天了!几天后,我仍然明白我的设计,这是一个好兆头:)

我在最后添加了一些额外的数据,以证明这适用于单行JobNumber条目,而不是假设称量总是分批,但结果中的第一行与原始解决方案匹配。

这种方法也使用级联CTE (比这里接受的答案多一个,但我不会让我灰心!)第一个是测试数据设置:

With Weighs AS   -- sample data
(
SELECT 100 AS JobNumber, '01/01/2014 08:00' AS TimeOfWeigh UNION ALL 
SELECT 100 AS JobNumber, '01/01/2014 09:00' AS TimeOfWeigh UNION ALL 
SELECT 100 AS JobNumber, '01/01/2014 10:00' AS TimeOfWeigh UNION ALL 
SELECT 200 AS JobNumber, '01/01/2014 12:00' AS TimeOfWeigh UNION ALL 
SELECT 200 AS JobNumber, '01/01/2014 13:00' AS TimeOfWeigh UNION ALL 
SELECT 300 AS JobNumber, '01/01/2014 15:00' AS TimeOfWeigh UNION ALL 
SELECT 300 AS JobNumber, '01/01/2014 16:00' AS TimeOfWeigh UNION ALL 
SELECT 100 AS JobNumber, '02/01/2014 08:00' AS TimeOfWeigh UNION ALL 
SELECT 100 AS JobNumber, '02/01/2014 09:00' AS TimeOfWeigh UNION ALL 
SELECT 100 AS JobNumber, '03/01/2014 10:00' AS TimeOfWeigh UNION ALL
SELECT 400 AS JobNumber, '04/01/2014 14:00' AS TimeOfWeigh UNION ALL
SELECT 300 AS JobNumber, '04/01/2014 14:30' AS TimeOfWeigh
)
,
Numbered AS  -- add on a unique consecutive row number
( SELECT *, ROW_NUMBER() OVER (ORDER BY TimeOfWeigh) AS ID FROM Weighs )
, 
GroupEnds AS  -- add on a 1/0 flag for whether it's the first or last in a run
( SELECT *,
    CASE WHEN -- next row is different JobNumber?
      (SELECT ID FROM Numbered n2 WHERE n2.ID=n1.ID+1 AND n2.JobNumber=n1.JobNumber) IS NULL
    THEN 1 ELSE 0 END AS GroupEnd,
    CASE WHEN -- previous row is different JobNumber?
      (SELECT ID FROM Numbered n2 WHERE n2.ID=n1.ID-1 AND n2.JobNumber=n1.JobNumber) IS NULL
    THEN 1 ELSE 0 END AS GroupBegin
  FROM Numbered n1 
)
,
Begins_and_Ends AS  -- make sure there are always matching pairs
( SELECT * FROM GroupEnds WHERE GroupBegin=1
    UNION ALL
  SELECT * FROM GroupEnds WHERE GroupEnd=1
)
,
Pairs AS  -- give matching pairs the same ID number for GROUPing next..
( SELECT *, (1+Row_Number() OVER (ORDER BY ID))/2 AS PairID
  FROM Begins_and_Ends
)
SELECT
  Min(JobNumber) AS JobNumber,
  Min(TimeOfWeigh) as [First Weigh],
  Max(TimeOfWeigh) as [Last Weigh]
FROM Pairs
GROUP BY PairID
ORDER BY PairID

Numbered CTE非常明显,每行都有一个有序的ID号。

CTE GroupEnds添加一对布尔值 - 如果该行是JobNumbers运行中的第一个或最后一个,则为1或0 - 通过尝试查看下一行或上一行是否是相同的JobNumber。 / p>

从那里我只需要一种方法来配对相邻的GroupBegins和GroupEnds。我使用N-tile排名函数NTILE()通过将rowcount除以2来计算GroupEnds并将该结果选为NTILE()的参数来生成这些数字 - 但是当有奇数行到期时这会破坏到单行批次,其中同一行是批次的开始和结束。

我通过保证相同数量的Begin和End行来解决这个问题:一个UNION of Begin行和End行,即使有些行是相同的行。这是CTE Begins_and_Ends

Pairs CTE使用Row_Number()除以2对对数进行加法 - 对于行对,整数结果PairID是相同的。

这给了我们以下内容 - JobNumber批次中间的所有行都已被过滤掉了:

JOBNUMBER  TIMEOFWEIGH     ID  End? Begin PairID
100     01/01/2014 08:00    1   0   1     1
100     01/01/2014 10:00    3   1   0     1
200     01/01/2014 12:00    4   0   1     2
200     01/01/2014 13:00    5   1   0     2
300     01/01/2014 15:00    6   0   1     3
300     01/01/2014 16:00    7   1   0     3
100     02/01/2014 08:00    8   0   1     4
100     03/01/2014 10:00    10  1   0     4
400     04/01/2014 14:00    11  1   1     5
400     04/01/2014 14:00    11  1   1     5
300     04/01/2014 14:30    12  1   1     6
300     04/01/2014 14:30    12  1   1     6

从那里开始,它现在是GROUP PairID的最后一块蛋糕,并抓住了第一个和最后一个称重时间。我很喜欢这个挑战,我想知道是否有其他人认为它在任何称重中都有用! http://sqlfiddle.com/#!3/b4f39/48

答案 3 :(得分:0)

是的,这是一个令人着迷的思维难题。谢谢你的分享。我想提出一种不涉及EXISTS或JOINS的解决方案

首先,我创建了一个带有job_id(j_id)和整数值的表,该表用于排序(j_v)。整数更容易键入,而逻辑与日期时间完全相同。

     select * from j order by j_v;
 j_id | j_v 
------+-----
  100 |   1
  100 |   2
  100 |   2
  100 |   2
  100 |   2
  100 |   3
  200 |   4
  200 |   5
  300 |   6
  300 |   6
  300 |   6
  300 |   7
  300 |   7
  100 |   8
  100 |   9
(15 rows)

我使用了Windows函数和3个CTE:

  • 第一个添加表格中的超前和滞后
  • 第二个过滤器仅保留那些开始或结束工作的行
  • 第三个人介绍了用于删除所有偶数行的row_number。

您在这里:

with X AS (
select j_id, j_v,
       coalesce ( lag(j_id,1) OVER (MY_W), -1)  as j_id_lag,
       lag(j_v,1) over (MY_W) as j_v_lag,
       coalesce ( lead(j_id,1) OVER (MY_W), -1)  as j_id_lead,
       lead(j_v,1) over (MY_W) as j_v_lead
from j
WINDOW MY_W as ( ORDER BY j_v)
order by j_v 
),
Y AS ( 
select *
from X
where j_id_lag != j_id_lead
),
Z AS ( 
select * ,
      lead(j_v) OVER () AS L2,
      row_number() OVER () as my_row
from Y
) 
SELECT j_id, j_v as job_start ,l2 as job_end
from Z
where my_row %2 = 1
;
 j_id | job_start | job_end
------+-----+----
  100 |   1 |  3
  200 |   4 |  5
  300 |   6 |  7
  100 |   8 |  9
(4 rows)

以下是查询计划:

                                                    QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 CTE Scan on z  (cost=325.94..379.17 rows=11 width=12) (actual time=0.047..0.071 rows=4 loops=1)
   Filter: ((my_row % 2::bigint) = 1)
   Rows Removed by Filter: 4
   CTE x
     ->  WindowAgg  (cost=149.78..203.28 rows=2140 width=8) (actual time=0.027..0.039 rows=15 loops=1)
           ->  Sort  (cost=149.78..155.13 rows=2140 width=8) (actual time=0.019..0.019 rows=15 loops=1)
                 Sort Key: j.j_v
                 Sort Method: quicksort  Memory: 25kB
                 ->  Seq Scan on j  (cost=0.00..31.40 rows=2140 width=8) (actual time=0.004..0.006 rows=15 loops=1)
   CTE y
     ->  CTE Scan on x  (cost=0.00..48.15 rows=2129 width=24) (actual time=0.031..0.050 rows=8 loops=1)
           Filter: (j_id_lag <> j_id_lead)
           Rows Removed by Filter: 7
   CTE z
     ->  WindowAgg  (cost=0.00..74.51 rows=2129 width=24) (actual time=0.042..0.062 rows=8 loops=1)
           ->  CTE Scan on y  (cost=0.00..42.58 rows=2129 width=24) (actual time=0.031..0.052 rows=8 loops=1)
 Total runtime: 0.122 ms
(17 rows)

如您所见,只有一种(按序列值或原始问题中的时间对数据进行排序)和几种CTE扫描,但没有联接。复杂性-NlogN正是我想要的那种。