我试图在SQL中预测订单流程的开始/结束时间。我确定了过去流程的平均持续时间。 进程在几个并行行(RNr)中运行,并且行彼此独立。每行可以具有1-30个持续时间不同的进程(PNr)。过程的持续时间可能会有所不同,并且仅称为平均持续时间。 一个过程完成后,下一个将自动开始。 这样PNr 1完成= PNr 2开始。
每行中第一个进程的开始时间在开始时是已知的,并且对于每行都是相同的。 当某些过程完成时,时间是已知的,应将其用于计算即将到来的过程的更准确的预测。 如何预测进程开始或停止的时间?
我使用了一个大型子查询来获取此表。
RNr PNr Duration_avg_h Start Finish
1 1 1 2019-06-06 16:32:11 2019-06-06 16:33:14
1 2 262 2019-06-06 16:33:14 NULL
1 3 51 NULL NULL
1 4 504 NULL NULL
1 5 29 NULL NULL
2 1 1 2019-06-06 16:32:11 NULL
2 2 124 NULL NULL
2 3 45 NULL NULL
2 4 89 NULL NULL
2 5 19 NULL NULL
2 6 1565 NULL NULL
2 7 24 NULL NULL
现在我想找到预测的值
SELECT
RNr,
PNr,
Duration_avg_h,
Start,
Finish,
Predicted_Start = CASE
WHEN Start IS NULL
THEN DATEADD(HH,LAG(Duration_avg_h, 1,NULL) OVER (ORDER BY RNr,PNr), LAG(Start, 1,NULL) OVER (ORDER BY RNr,PNr))
ELSE Start END,
Predicted_Finish = CASE
WHEN Finish IS NULL
THEN DATEADD(HH,Duration_avg_h,Start)
ELSE Finish END,
SUM(Duration_avg_h) over (PARTITION BY RNr ORDER BY RNr, PNr ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Duration_row_h
FROM (...)
ORDER BY RNr, PNr
我尝试了LAG(),但是我只能得到下一行的值。我也对“无边界的行和当前行之间的行”没有定论。
RNr PNr Duration_avg_h Start Finish Predicted_Start Predicted_Finish Duration_row_h
1 1 1 2019-06-06 16:32:11 2019-06-06 16:33:14 2019-06-06 16:32:11 2019-06-06 16:33:14 1
1 2 262 2019-06-06 16:33:14 NULL 2019-06-06 16:33:14 2019-06-17 14:33:14 263
1 3 51 NULL NULL 2019-06-17 14:33:14 NULL 314
1 4 504 NULL NULL NULL NULL 818
1 5 29 NULL NULL NULL NULL 847
2 1 1 2019-06-06 16:32:11 NULL 2019-06-06 16:32:11 2019-06-06 17:32:11 1
2 2 124 NULL NULL 2019-06-06 17:32:11 NULL 125
2 3 45 NULL NULL NULL NULL 170
2 4 89 NULL NULL NULL NULL 259
2 5 19 NULL NULL NULL NULL 278
那么有人可以帮助我填写Predicted_Start和Predicted_Finish列吗?
答案 0 :(得分:1)
LAG仅在所有行都有值的情况下才起作用。对于这种用例,您需要将结果从一排级联到另一排。一种方法是通过自连接获取运行总计
--Sample Data
DECLARE @dataset TABLE
(
RNr INT
,PNr INT
,Duration_avg_h INT
,START DATETIME
,Finish DATETIME
)
INSERT INTO @dataset
(
RNr
,PNr
,Duration_avg_h
,START
,Finish
)
VALUES
(1, 1, 1, '2019-06-06 16:32:11',NULL)
,(1, 2, 262, NULL,NULL)
,(1, 3, 51, NULL,NULL)
,(1, 4, 504, NULL,NULL)
,(1, 5, 29, NULL,NULL)
,(2, 1, 1, '2019-06-06 16:32:11', NULL)
,(2, 2, 124, NULL,NULL)
,(2, 3, 45, NULL,NULL)
,(2, 4, 89, NULL,NULL)
,(2, 5, 19, NULL,NULL)
,(2, 6, 1565, NULL,NULL)
,(2, 7, 24, NULL,NULL)
SELECT
d.RNr,
d.PNr,
d.Duration_avg_h,
d.Start,
d.Finish,
--SUM() gives us the total time up to and including this step
--take of the current step and you get the total time of all the previous steps
--this can give us our start time, or when the previous step ended.
SUM(running_total.Duration_avg_h) - d.Duration_avg_h AS running_total_time,
--MIN() gives us the lowest start time we have pre process.
MIN(running_total.Start) AS min_start,
ISNULL(
d.Start
,DATEADD(HH,SUM(running_total.Duration_avg_h),MIN(running_total.Start) )
) AS Predicted_Start,
ISNULL(
d.Finish
,DATEADD(HH,SUM(running_total.Duration_avg_h),MIN(running_total.Start) )
) AS Predicted_Finish
FROM @dataset AS d
LEFT JOIN @dataset AS running_total
ON d.RNr = running_total.RNr
AND
--the running total for all steps.
running_total.PNr <= d.PNr
GROUP BY
d.RNr,
d.PNr,
d.Duration_avg_h,
d.Start,
d.Finish
ORDER BY
RNr,
PNr
除非您将Duration_avg_h更新为实际花费的时间,否则该代码将在您具有实际的完成时间后才起作用。
答案 1 :(得分:0)
乔纳森,谢谢您的帮助。
您使用“ MIN (running_total.Start) AS min_start,
”的想法使我想到了使用“ MAX (d.Start) OVER (PARTITION BY RNr)
”的想法。这导致了以下查询:
--Sample Data
DECLARE @dataset TABLE
(
RNr INT
,PNr INT
,Duration_avg_h INT
,START DATETIME
,Finish DATETIME
)
INSERT INTO @dataset
(
RNr
,PNr
,Duration_avg_h
,START
,Finish
)
VALUES
(1, 1, 1, '2019-06-06 16:32:11','2019-06-06 16:33:14')
,(1, 2, 262, '2019-06-06 16:33:14','2019-08-22 17:30:00')
,(1, 3, 51, '2019-08-22 17:30:00',NULL)
,(1, 4, 504, NULL,NULL)
,(1, 5, 29, NULL,NULL)
,(2, 1, 1, '2019-06-06 16:32:11', NULL)
,(2, 2, 124, NULL,NULL)
,(2, 3, 45, NULL,NULL)
,(2, 4, 89, NULL,NULL)
,(2, 5, 19, NULL,NULL)
,(2, 6, 1565, NULL,NULL)
,(2, 7, 24, NULL,NULL)
SELECT RNr,
PNr,
Duration_avg_h,
Start,
Finish,
--Start_max,
--Finish_bit,
--Duration_row_h,
CASE WHEN Start IS NOT NULL THEN Start ELSE DATEADD(HH,(Duration_row_h - MAX(Duration_row_h*Finish_bit) OVER (PARTITION BY RNr) - Duration_avg_h), Start_max) END as Predicted_Start,
CASE WHEN Finish IS NOT NULL THEN Finish ELSE DATEADD(HH,(Duration_row_h - MAX(Duration_row_h*Finish_bit) OVER (PARTITION BY RNr)), Start_max) END as Predicted_Finish
FROM ( SELECT
RNr,
PNr,
Duration_avg_h,
--Convert to a short DATETIME format
CONVERT(DATETIME2(0),Start) as Start,
CONVERT(DATETIME2(0),Finish) as Finish,
--Get MAX start time for each row
Start_max = MAX (CONVERT(DATETIME2(0),d.Start)) OVER (PARTITION BY RNr),
--If process is finished then 1
Finish_bit = (CASE WHEN d.Finish IS NULL THEN 0 ELSE 1 END),
--continuously count the Duration of all processes in the row
SUM(Duration_avg_h) over (PARTITION BY RNr ORDER BY RNr, PNr ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Duration_row_h
FROM @dataset AS d
) AS e
ORDER BY
RNr,
PNr
此查询考虑了开始时间和停止时间的变化。并据此计算出即将到来的过程的预测。
RNr PNr Duration_avg_h Start Finish Predicted_Start Predicted_Finish
1 1 1 2019-06-06 16:32:11 2019-06-06 16:33:14 2019-06-06 16:32:11 2019-06-06 16:33:14
1 2 262 2019-06-06 16:33:14 2019-08-22 17:30:00 2019-06-06 16:33:14 2019-08-22 17:30:00
1 3 51 2019-08-22 17:30:00 NULL 2019-08-22 17:30:00 2019-08-24 20:30:00
1 4 504 NULL NULL 2019-08-24 20:30:00 2019-09-14 20:30:00
1 5 29 NULL NULL 2019-09-14 20:30:00 2019-09-16 01:30:00
2 1 1 2019-06-06 16:32:11 NULL 2019-06-06 16:32:11 2019-06-06 17:32:11
2 2 124 NULL NULL 2019-06-06 17:32:11 2019-06-11 21:32:11
2 3 45 NULL NULL 2019-06-11 21:32:11 2019-06-13 18:32:11
2 4 89 NULL NULL 2019-06-13 18:32:11 2019-06-17 11:32:11
2 5 19 NULL NULL 2019-06-17 11:32:11 2019-06-18 06:32:11
2 6 1565 NULL NULL 2019-06-18 06:32:11 2019-08-22 11:32:11
2 7 24 NULL NULL 2019-08-22 11:32:11 2019-08-23 11:32:11
我认为这种方式仍然很复杂。有谁知道一个简单的查询?