我认为描述我要找的内容的最佳方式是显示一个数据表以及我想从Query中返回的内容。这是SQL Server中的一个简单数据表:
JobNumber TimeOfWeigh
100 01/01/2014 08:00
100 01/01/2014 09:00
100 01/01/2014 10:00
200 01/01/2014 12:00
200 01/01/2014 13:00
300 01/01/2014 15:00
300 01/01/2014 16:00
100 02/01/2014 08:00
100 02/01/2014 09:00
100 03/01/2014 10:00
我想要一个查询,它将对作业进行分组并返回每个组的第一个和最后一个DateTime。但是,正如您在这里看到的那样,有100套100个工作号码。我不希望第二组加入第一组。
相反,我想这样:
JobNumber First Weigh Last Weigh
100 01/01/2014 08:00 01/01/2014 10:00
200 01/01/2014 12:00 01/01/2014 13:00
300 01/01/2014 15:00 01/01/2014 16:00
100 02/01/2014 08:00 03/01/2014 10:00
我一直在努力奋斗数小时。任何帮助将不胜感激。
EDITED
日期&时代都只是虚拟的随机数据。实际数据在一天内有数千个重量。我希望每个作业的第一个和最后一个重量来确定作业的持续时间,以便我可以在时间线上表示持续时间。但是我希望两次显示Job 100,表示它已暂停并在200&之后恢复完成300个
答案 0 :(得分:2)
这是我对此的尝试,使用row_number()和分区。我把它分解成了一些步骤,希望能让它变得更容易理解。如果您的表中已经有一个包含整数标识符的列,那么您可以省略第一个CTE。即使在那之后,您也可以进一步简化这一过程,但它似乎确实有效。
(编辑添加一个标志,指示评论中要求的具有多个范围的作业。)
declare @sampleData table (JobNumber int, TimeOfWeigh datetime);
insert into @sampleData values
(100, '01/01/2014 08:00'),
(100, '01/01/2014 09:00'),
(100, '01/01/2014 10:00'),
(200, '01/01/2014 12:00'),
(200, '01/01/2014 13:00'),
(300, '01/01/2014 15:00'),
(300, '01/01/2014 16:00'),
(100, '02/01/2014 08:00'),
(100, '02/01/2014 09:00'),
(100, '03/01/2014 10:00');
-- The first CTE assigns an ordering to the records according to TimeOfWeigh,
-- producing the row numbers you gave in your example.
with JobsCTE as
(
select
row_number() over (order by TimeOfWeigh) as RowNumber,
JobNumber,
TimeOfWeigh
from @sampleData
),
-- The second CTE orders by the RowNumber we created above, but restarts the
-- ordering every time the JobNumber changes. The difference between RowNumber
-- and this new ordering will be constant within each group.
GroupsCTE as
(
select
RowNumber - row_number() over (partition by JobNumber order by RowNumber) as GroupNumber,
JobNumber,
TimeOfWeigh
from JobsCTE
),
-- Join by JobNumber alone to determine which jobs appear multiple times.
DuplicatedJobsCTE as
(
select JobNumber
from GroupsCTE
group by JobNumber
having count(distinct GroupNumber) > 1
)
-- Finally, we use GroupNumber to get the mins and maxes from contiguous ranges.
select
G.JobNumber,
min(G.TimeOfWeigh) as [First Weigh],
max(G.TimeOfWeigh) as [Last Weigh],
case when D.JobNumber is null then 0 else 1 end as [Multiple Ranges]
from
GroupsCTE G
left join DuplicatedJobsCTE D on G.JobNumber = D.JobNumber
group by
G.JobNumber,
G.GroupNumber,
D.JobNumber
order by
[First Weigh];
答案 1 :(得分:0)
您必须使用自联接来创建包含每个集合中第一行和最后一行的伪表。
Select F.JobNumber,
f.TimeOfWeigh FirstWeigh,
l.TimeOfWeigh LastWeigh
From table f -- for first record
join table l -- for last record
on l.JobNumber = f.JobNumber
And Not exists
(Select * from table
Where JobNumber = f.JobNumber
And id = f.id-1)
And Not exists
(Select * from table
Where JobNumber = f.JobNumber
And id = l.id+1)
And Not Exists
(Select * from table
Where JobNumber <> f.JobNumber
And id Between f.Id and l.Id)
答案 2 :(得分:0)
当我看到它时,这个让我着迷,我想知道如何解决这个问题。我太忙了,不能先得到一个答案,我以后才开始工作但是从那以后就已经开了几天了!几天后,我仍然明白我的设计,这是一个好兆头:)
我在最后添加了一些额外的数据,以证明这适用于单行JobNumber条目,而不是假设称量总是分批,但结果中的第一行与原始解决方案匹配。
这种方法也使用级联CTE (比这里接受的答案多一个,但我不会让我灰心!)第一个是测试数据设置:
With Weighs AS -- sample data
(
SELECT 100 AS JobNumber, '01/01/2014 08:00' AS TimeOfWeigh UNION ALL
SELECT 100 AS JobNumber, '01/01/2014 09:00' AS TimeOfWeigh UNION ALL
SELECT 100 AS JobNumber, '01/01/2014 10:00' AS TimeOfWeigh UNION ALL
SELECT 200 AS JobNumber, '01/01/2014 12:00' AS TimeOfWeigh UNION ALL
SELECT 200 AS JobNumber, '01/01/2014 13:00' AS TimeOfWeigh UNION ALL
SELECT 300 AS JobNumber, '01/01/2014 15:00' AS TimeOfWeigh UNION ALL
SELECT 300 AS JobNumber, '01/01/2014 16:00' AS TimeOfWeigh UNION ALL
SELECT 100 AS JobNumber, '02/01/2014 08:00' AS TimeOfWeigh UNION ALL
SELECT 100 AS JobNumber, '02/01/2014 09:00' AS TimeOfWeigh UNION ALL
SELECT 100 AS JobNumber, '03/01/2014 10:00' AS TimeOfWeigh UNION ALL
SELECT 400 AS JobNumber, '04/01/2014 14:00' AS TimeOfWeigh UNION ALL
SELECT 300 AS JobNumber, '04/01/2014 14:30' AS TimeOfWeigh
)
,
Numbered AS -- add on a unique consecutive row number
( SELECT *, ROW_NUMBER() OVER (ORDER BY TimeOfWeigh) AS ID FROM Weighs )
,
GroupEnds AS -- add on a 1/0 flag for whether it's the first or last in a run
( SELECT *,
CASE WHEN -- next row is different JobNumber?
(SELECT ID FROM Numbered n2 WHERE n2.ID=n1.ID+1 AND n2.JobNumber=n1.JobNumber) IS NULL
THEN 1 ELSE 0 END AS GroupEnd,
CASE WHEN -- previous row is different JobNumber?
(SELECT ID FROM Numbered n2 WHERE n2.ID=n1.ID-1 AND n2.JobNumber=n1.JobNumber) IS NULL
THEN 1 ELSE 0 END AS GroupBegin
FROM Numbered n1
)
,
Begins_and_Ends AS -- make sure there are always matching pairs
( SELECT * FROM GroupEnds WHERE GroupBegin=1
UNION ALL
SELECT * FROM GroupEnds WHERE GroupEnd=1
)
,
Pairs AS -- give matching pairs the same ID number for GROUPing next..
( SELECT *, (1+Row_Number() OVER (ORDER BY ID))/2 AS PairID
FROM Begins_and_Ends
)
SELECT
Min(JobNumber) AS JobNumber,
Min(TimeOfWeigh) as [First Weigh],
Max(TimeOfWeigh) as [Last Weigh]
FROM Pairs
GROUP BY PairID
ORDER BY PairID
Numbered
CTE非常明显,每行都有一个有序的ID号。
CTE GroupEnds
添加一对布尔值 - 如果该行是JobNumbers运行中的第一个或最后一个,则为1或0 - 通过尝试查看下一行或上一行是否是相同的JobNumber。 / p>
从那里我只需要一种方法来配对相邻的GroupBegins和GroupEnds。我使用N-tile排名函数NTILE()通过将rowcount除以2来计算GroupEnds并将该结果选为NTILE()的参数来生成这些数字 - 但是当有奇数行到期时这会破坏到单行批次,其中同一行是批次的开始和结束。
我通过保证相同数量的Begin和End行来解决这个问题:一个UNION of Begin行和End行,即使有些行是相同的行。这是CTE Begins_and_Ends
。
Pairs
CTE使用Row_Number()除以2对对数进行加法 - 对于行对,整数结果PairID
是相同的。
这给了我们以下内容 - JobNumber批次中间的所有行都已被过滤掉了:
JOBNUMBER TIMEOFWEIGH ID End? Begin PairID
100 01/01/2014 08:00 1 0 1 1
100 01/01/2014 10:00 3 1 0 1
200 01/01/2014 12:00 4 0 1 2
200 01/01/2014 13:00 5 1 0 2
300 01/01/2014 15:00 6 0 1 3
300 01/01/2014 16:00 7 1 0 3
100 02/01/2014 08:00 8 0 1 4
100 03/01/2014 10:00 10 1 0 4
400 04/01/2014 14:00 11 1 1 5
400 04/01/2014 14:00 11 1 1 5
300 04/01/2014 14:30 12 1 1 6
300 04/01/2014 14:30 12 1 1 6
从那里开始,它现在是GROUP PairID
的最后一块蛋糕,并抓住了第一个和最后一个称重时间。我很喜欢这个挑战,我想知道是否有其他人认为它在任何称重中都有用!
http://sqlfiddle.com/#!3/b4f39/48
答案 3 :(得分:0)
是的,这是一个令人着迷的思维难题。谢谢你的分享。我想提出一种不涉及EXISTS或JOINS的解决方案
首先,我创建了一个带有job_id(j_id)和整数值的表,该表用于排序(j_v)。整数更容易键入,而逻辑与日期时间完全相同。
select * from j order by j_v;
j_id | j_v
------+-----
100 | 1
100 | 2
100 | 2
100 | 2
100 | 2
100 | 3
200 | 4
200 | 5
300 | 6
300 | 6
300 | 6
300 | 7
300 | 7
100 | 8
100 | 9
(15 rows)
我使用了Windows函数和3个CTE:
您在这里:
with X AS (
select j_id, j_v,
coalesce ( lag(j_id,1) OVER (MY_W), -1) as j_id_lag,
lag(j_v,1) over (MY_W) as j_v_lag,
coalesce ( lead(j_id,1) OVER (MY_W), -1) as j_id_lead,
lead(j_v,1) over (MY_W) as j_v_lead
from j
WINDOW MY_W as ( ORDER BY j_v)
order by j_v
),
Y AS (
select *
from X
where j_id_lag != j_id_lead
),
Z AS (
select * ,
lead(j_v) OVER () AS L2,
row_number() OVER () as my_row
from Y
)
SELECT j_id, j_v as job_start ,l2 as job_end
from Z
where my_row %2 = 1
;
j_id | job_start | job_end
------+-----+----
100 | 1 | 3
200 | 4 | 5
300 | 6 | 7
100 | 8 | 9
(4 rows)
以下是查询计划:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
CTE Scan on z (cost=325.94..379.17 rows=11 width=12) (actual time=0.047..0.071 rows=4 loops=1)
Filter: ((my_row % 2::bigint) = 1)
Rows Removed by Filter: 4
CTE x
-> WindowAgg (cost=149.78..203.28 rows=2140 width=8) (actual time=0.027..0.039 rows=15 loops=1)
-> Sort (cost=149.78..155.13 rows=2140 width=8) (actual time=0.019..0.019 rows=15 loops=1)
Sort Key: j.j_v
Sort Method: quicksort Memory: 25kB
-> Seq Scan on j (cost=0.00..31.40 rows=2140 width=8) (actual time=0.004..0.006 rows=15 loops=1)
CTE y
-> CTE Scan on x (cost=0.00..48.15 rows=2129 width=24) (actual time=0.031..0.050 rows=8 loops=1)
Filter: (j_id_lag <> j_id_lead)
Rows Removed by Filter: 7
CTE z
-> WindowAgg (cost=0.00..74.51 rows=2129 width=24) (actual time=0.042..0.062 rows=8 loops=1)
-> CTE Scan on y (cost=0.00..42.58 rows=2129 width=24) (actual time=0.031..0.052 rows=8 loops=1)
Total runtime: 0.122 ms
(17 rows)
如您所见,只有一种(按序列值或原始问题中的时间对数据进行排序)和几种CTE扫描,但没有联接。复杂性-NlogN正是我想要的那种。