与FIRSTVALUE()/ LAST_VALUE()分组

时间:2019-05-16 17:56:46

标签: sql sql-server

我有定时的交易数据。它有几个职位编号。有时,作业编号被分为两部分,而另一份作业在它们之间运行。我想用自己的统计信息在两个不同的行中报告这些拆分的工作。

我尝试了许多不同的基于窗口的解决方案,看来FIRST_VALUE()LAST_VALUE()是我最好的选择。我希望在列中提供作业的第一笔和最后一笔交易时间,因此我可以对其进行分组并显示交易数量。

当我使用这些工具时,即使我按工作分区,LastKit和FirstKit的行为也好像我做了一个小组一样。我希望分组,但要对工作进行分区。

select  FIRST_VALUE(DTIMECRE) OVER(PARTITION BY job  ORDER BY dtimecre) AS KitStart,
LAST_VALUE(DTIMECRE)  OVER(PARTITION BY job  ORDER BY dtimecre 
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS KitEnd,
count(*) as QtyKitted
from transactions
order by dtimecre

    KitStart            KitEnd                  Job    dtimecre 
                                                          SystemicLocation
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:07    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:08    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:09    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:10    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:10    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:11    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:12    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:13    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:13    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:14    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:15    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:16    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:46    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:47    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:48    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:49    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:49    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:06:17    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:12:16    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:12:26    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:12:32    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:12:39    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:12:45    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:13:38    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:13:45    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:13:50    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:13:55    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:14:00    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:14:06    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:14:11    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:35:51    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:35:51    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:35:52    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:36:23    3
    Lots of transactions……              
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 15:17:19    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 15:17:19    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 15:17:20    3

查看数据:第一个KitStart是11:00:07,然后在下一个作业(979309)出现时,交易到11:06:17。但是,当作业978437重新开始时,它可以回到11:00:07。我希望这是第一次放风筝,因此11:35:51。

与LAST_VALUE相同的问题。在第一笔交易中,它是15:17:20,这是第二轮作业978437的结束。我希望它是11:00:49。

总结我想要的输出,它看起来像:

    KitStart            KitEnd              Job    QtyKitted
    5/15/19 11:00:07    5/15/19 15:17:20    978437  17
    5/15/19 11:06:17    5/15/19 11:14:11    979309  13
    5/15/19 11:35:51    5/15/19 15:17:20    978437  1007

这表明作业978437已启动,运行了17个单位,切换到作业979309,运行了13,然后又切换回978437,运行了1007个单位。

此外,这是我的第一篇SQL Server帖子(也是第二篇),感谢您对Stackoverflow帖子的任何不符合,我可能有几个。谢谢!

2 个答案:

答案 0 :(得分:1)

感谢Group consecutive rows of same value using time spans

,我找到了答案

我添加了另一列(Ranker),以便随着时间的流逝对prod_id的每个非连续发生率分别进行分组。

with A as (
    select prod_id, sku, dtimecre, systemiclocation,
    prevProd_id = lag(prod_id, 1, prod_id)  over (order by dtimecre)
    from transactions
),
B as  (
    select prod_id, sku, dtimecre, systemiclocation,
    Ranker = SUM(CASE WHEN prod_id = Prevprod_id THEN 0 ELSE 1 END)
                OVER (order by dtimecre)
    FROM   A
    )

select prod_id, sku, min(dtimecre) as KitStart, max(dtimecre) as KitEnd from B
group by prod_id, sku, Ranker
order by min(dtimecre)

生产:

prod_id KitStart                    KitEnd                       QtyKittted
978437  2019-05-15 11:00:07.0000000 2019-05-15 11:00:49.0000000 17
979309  2019-05-15 11:06:17.0000000 2019-05-15 11:14:11.0000000 13
978437  2019-05-15 11:35:51.0000000 2019-05-15 15:17:20.0000000 1007

答案 1 :(得分:0)

这个问题有点令人困惑,但在我看来,您希望每个Job的MIN(KitStart)和MAX(KitEnd)都不受干扰。如果是这种情况,那么GROUP BY查询应满足这些要求。

SELECT MIN(DTIMECRE) AS KitStart,
       MAX(DTIMECRE) AS KitEnd,
       job, 
       sku,  
       SystemicLocation 
FROM transactions
GROUP BY job, 
sku,   
SystemicLocation 
ORDER BY dtimecre

但是,执行此操作时,您实际上无法将DTimeCRE视为独立列。根据提供的数据,您可能会为每个作业获得一行,这不是想要的。

要变通解决此问题,我建议使用公用表表达式(CTE)。我假设Job是一个唯一的标识符,并且您不会将一个工作编号重复用于多个工作。如果没有,您可以自己修改查询。

WITH MINDTimeCRE AS
(SELECT MIN(DTIMECRE) DTIMECRE,
       Job
       FROM transactions
       GROUO BY Job)
,MAXDTimeCRE AS 
(SELECT MAX(DTIMECRE) DTIMECRE,
       Job
       FROM transactions
       GROUP BY Job)
SELECT MINDTimeCRE.DTIMECRE KitStart,
       MAXDTimeCRE.DTIMECRE AS KitEnd,
       job, 
       sku,  
       DTIMECRE,
       SystemicLocation 
FROM transactions
  LEFT JOIN MINDTimeCRE 
  ON transactions.Job = MINDTimeCRE.Job
    LEFT JOIN MAXDTimeCRE 
    ON transactions.Job = MAXDTimeCRE.Job

这将几乎将查询中的前两个SELECT语句视为表,并且使用LEFT JOIN将这些CTE与查询连接后,您就可以将它们视为是列,而无需进行汇总。