SQL查询找到Start&具有间隙的列中的范围结束

时间:2014-07-24 22:08:12

标签: sql sql-server ms-access gaps-in-data

我在Access中有一个表,其中包含SKU列及其Sales列。销售列具有间隙,即空白或零=> 3。零被认为是空白,应该被消除。间隙将被视为> = 3空白或零。对于每个不同的SKU,我想在其中找到连续范围的开始和结束。计数(结束 - 开始+ 1)。

小例子:

SKU         SALES
==================
ABC        6504.00
ABC        3304.23
ABC        0
ABC        0
ABC        
ABC        
ABC        403.053
ABC        3493.00
ABC        3939.02
DEF        4935.24
DEF        3037.22
DEF        
DEF        
DEF        
DEF        392.042
DEF        0
DEF        0
DEF        3493.03
DEF        8644.40
DEF        643.035
DEF        5333.22

结果集:

SKU        RANGE     START     END    COUNT
ABC        1         1         2      2-1+1=2
ABC        2         7         9      9-7+1=3
DEF        1         10        11     11-10+1=2
DEF        2         13        19     19-13+1=7

然后,应将此结果集连接到原始表,以消除任何具有范围计数< = 13的SKU行。只有SKU范围内具有最大计数的SKU范围应保存在表/记录集中。

我正在使用MSAccess,但任何人都可以将其演示为Access查询以及SQL Server查询吗?

===================编辑=========================

你好@Kevin,

我终于得到了查询工作并给了我正确的销售周数范围,虽然我现在需要一些帮助将它连接回原始的临时表以仅提取选择性行。 JFYI,在运行此查询之前,我已更新所有Sales KPI列以用零替换NULL(空白)。

USE MASTER
GO

WITH Salesrows AS 
(
SELECT
    [SCOUNTRY],
    [SCHAR],
    [DESCRIPTION],
    [SALES VALUE WITH INNOVATION]=IIF([SALES VALUE WITH INNOVATION] IS NULL,0,[SALES VALUE WITH INNOVATION]),
    CONVERT(INT, SUBSTRING([WEEK], 8, 2)) Wk,
    CONVERT(INT, SUBSTRING([WEEK], 3, 4)) Yr,
    [wkno],
    ROW_NUMBER() OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY [WEEK]) RN
FROM STAGING
WHERE ([Level] = 'Item') 
)
,SalesRanges as 
(
SELECT *,        
    LAG([SALES VALUE WITH INNOVATION], 1) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L1,
    LAG([SALES VALUE WITH INNOVATION], 2) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L2,
    LEAD([SALES VALUE WITH INNOVATION], 1) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L5,
    LEAD([SALES VALUE WITH INNOVATION], 2) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L6
FROM SalesRows 
),
Clearcontents as
(
SELECT *,
    (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L1,0) = 0 AND ISNULL(L2,0) = 0  THEN 1 ELSE 0 END) RemoveMe0,
    (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L5,0) = 0 AND ISNULL(L6,0) = 0  THEN 1 ELSE 0 END) RemoveMe1,
    (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L1,0) = 0 AND L2<>0 AND ISNULL(L5,0) = 0 AND L6<>0 THEN 1 ELSE 0 END) RemoveMe2
FROM SalesRanges
),
CleanedData AS
(
SELECT *,
     ROW_NUMBER() OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY yr, RN) NewRn
FROM ClearContents
WHERE RemoveMe0 != 1 and RemoveMe1 != 1 and RemoveMe2 != 1
),
WeekGaps as 
(
SELECT *,
    (NewRn - Rn) Ref
FROM CleanedData
),
CorrectWeekPeriods as 
(
SELECT 
    [SCOUNTRY], 
    [SCHAR],
    [DESCRIPTION],
    COUNT([wkno]) AS CNTWKS,
    MIN([wkno]) AS MINWEEK,
    MAX([wkno]) AS MAXWEEK,
    REF
FROM WeekGaps
GROUP BY [SCOUNTRY],[SCHAR],[DESCRIPTION],[REF]
)
SELECT 
    C.[SCOUNTRY], 
    C.[SCHAR],
    C.[DESCRIPTION],
    CONVERT(INT, SUBSTRING(yw1.yrwk ,5,2)) WEEKS,
    C.CNTWKS, 
    yw1.yrwk AS MINWEEK, 
    yw2.yrwk AS MAXWEEK
FROM CorrectWeekPeriods AS C 
INNER JOIN yearweek AS yw1 ON C.MINWEEK = yw1.rn
INNER JOIN yearweek AS yw2 ON C.MAXWEEK = yw2.rn 
--WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) 
--AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.[SCOUNTRY] AND C.[SCHAR]=A.[SCHAR] AND C.[DESCRIPTION]=A.[DESCRIPTION]))
--AND SUBSTRING(CAST(yw1.yrwk AS VARCHAR(6)),5,2) >= 1)
--AND C.Description='0241004245' 
WHERE C.Description='0241004245'
  1. 我需要将CTE的哪些字段连接到临时表字段,才能将这些选择性句点行显示在表格中?

  2. 我确信此查询可以进行优化并使其更简洁。但是如何?

  3. 此外,如果我评论上面的 CorrectWeekPeriods 中的最后一个 WHERE 条款,并且多次运行查询,我会不同的行数。我查看了执行计划并且没有收到任何错误。

  4. 如果我只是取消注释 WHERE子句:

    WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) 
    AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.
    

    或者这个:

    WHERE C.Description='0241004245'
    

    我得到适当的分钟&amp;最大销售周范围。

    1. 此外,如果我取消注释

      在哪里C.Description =&#39; 0241004245&#39;

    2. 我收到执行计划中显示的错误:

      /*
      Missing Index Details from SQL_Correct Gaps.sql - ABC.master (ALPHA\SIFAR (52))
      The Query Processor estimates that implementing the following index could improve the query cost by 97.7228%.
      */
      
      /*
      USE [master]
      GO
      CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
      ON [dbo].[staging] ([Level],[Description])
      INCLUDE ([Week],[Sales Value with Innovation],[sCountry],[sChar],[wkno])
      GO
      */
      

      但如果我保留最后一个WHERE子句注释,我就不会收到此错误。顺便说一句,我已经创建了上面的索引,所以不知道为什么要求我再次创建相同的索引。出现这种情况的原因是什么?

      此外,最后几个注释代码是我试图创建的但不能编写正确代码的规则。这是规则:

      1. 如果有2个或更多SKU销售周范围,则选择最大值(如果从2011年第1周开始,则更好)。
      2. 排除任何> 52的范围,将它们带到&lt; = 52。
      3. 如果所有SKU销售周范围> 13&amp; &lt; = 52,然后只保留最大值(如果从2011年第1周开始,则更好)。
      4. 排除任何范围&lt; = 13。
      5. 希望有人可以指导我朝着正确的方向前进(特别是我要点1加入Staging表以提取适当的SKU销售周范围)。

        编辑... 我刚刚再次取消注释任何最后一个WHERE子句:

        WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) 
        AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.[SCOUNTRY] AND C.[SCHAR]=A.[SCHAR] AND C.[DESCRIPTION]=A.[DESCRIPTION]))
        AND SUBSTRING(CAST(yw1.yrwk AS VARCHAR(6)),5,2) >= 1
        

        并查看了执行计划。它显示SORT&amp;的警告HASH。警告信息是:

        Operator used tempdb to spill data during execution with spill level 1
        

        每次执行查询时,我都得到不同的行数。查询还需要约1分钟才能执行。我认为它与 yearweek 表的联接有关,但不知道如何解决这个问题。

        任何帮助都会非常感激。

        你好@kevin Cook,

        以下是表格定义:

        USE [master]
        GO
        
        /****** Object:  Table [dbo].[staging]    Script Date: 8/6/2014 11:27:29 PM ******/
        DROP TABLE [dbo].[staging]
        GO
        
        /****** Object:  Table [dbo].[staging]    Script Date: 8/6/2014 11:27:29 PM ******/
        SET ANSI_NULLS ON
        GO
        
        SET QUOTED_IDENTIFIER ON
        GO
        
        SET ANSI_PADDING ON
        GO
        
        CREATE TABLE [dbo].[staging](
            [Level] [varchar](5) NULL,
            [Week] [varchar](9) NULL,
            [Category] [varchar](50) NULL,
            [Manufacturer] [varchar](50) NULL,
            [Brand] [varchar](50) NULL,
            [Description] [varchar](100) NULL,
            [EAN] [varchar](100) NULL,
            [Sales Value with Innovation] [float] NULL,
            [Sales Units with Innovation] [float] NULL,
            [Price Per Item] [float] NULL,
            [Importance Value w Innovation] [float] NULL,
            [Importance Units w Innovation] [float] NULL,
            [Numeric Distribution] [float] NULL,
            [Weighted Distribution] [float] NULL,
            [Average Number of Item] [float] NULL,
            [Value] [float] NULL,
            [Volume] [float] NULL,
            [Units] [float] NULL,
            [Sales Value New Manufacturer] [float] NULL,
            [Sales Value New Brand] [float] NULL,
            [Sales Value New Line Extension] [float] NULL,
            [Sales Value New Packaging] [float] NULL,
            [Sales Value New Size] [float] NULL,
            [Sales Value New Product Form] [float] NULL,
            [Sales Value New Style Type] [float] NULL,
            [Sales Value New Flavour Fragr] [float] NULL,
            [Sales Value New Claim] [float] NULL,
            [Sales Units New Manufacturer] [float] NULL,
            [Sales Units New Brand] [float] NULL,
            [Sales Units New Line Extension] [float] NULL,
            [Sales Units New Packaging] [float] NULL,
            [Sales Units New Size] [float] NULL,
            [Sales Units New Product Form] [float] NULL,
            [Sales Units New Style Type] [float] NULL,
            [Sales Units New Flavour Fragr] [float] NULL,
            [Sales Units New Claim] [float] NULL,
            [filename] [nvarchar](260) NULL,
            [importdate] [datetime] NULL CONSTRAINT [DF_staging_importdate]  DEFAULT (getdate()),
            [sCountry] [varchar](50) NULL,
            [sChar] [varchar](50) NULL,
            [yr] [int] NULL,
            [wk] [int] NULL,
            [wkno] [int] NULL
        ) ON [PRIMARY]
        
        GO
        
        SET ANSI_PADDING OFF
        GO
        

1 个答案:

答案 0 :(得分:1)

这适用于SQL Server 2012,要将其更改为2008+,您必须在SaleRanges表中执行SaleRows的多个selfjoin来处理LAG函数的用途。 以下是一些示例数据:

DECLARE @SalesTape TABLE
(   
    SKU VARCHAR(10),
    SALES DECIMAL(19,3),
    YEARWEEK VARCHAR(10)
)

INSERT INTO @SalesTape
VALUES
('ABC', 6504.00, 'W 2011 01'),
('ABC', 3304.23, 'W 2011 02'),
('ABC', 0, 'W 2011 03'),
('ABC', 0, 'W 2011 04'),
('ABC', null, 'W 2011 05'),
('ABC', null, 'W 2011 06'),
('ABC', 403.053, 'W 2011 07'),
('ABC', 3493.00, 'W 2011 08'),
('ABC', 3939.02, 'W 2011 09'),
('DEF', 4935.24, 'W 2011 10'),
('DEF', 3037.22, 'W 2011 11'),
('DEF', null, 'W 2011 12'),
('DEF', null, 'W 2011 13'),
('DEF', null, 'W 2011 14'),
('DEF', 392.042, 'W 2011 15'),
('DEF', 0, 'W 2011 16'),
('DEF', 0, 'W 2011 17'),
('DEF', 3493.03, 'W 2011 18'),
('DEF', 8644.40, 'W 2011 19'),
('DEF', 643.035, 'W 2011 20'),
('DEF', 5333.22, 'W 2011 21');

我的第一个CTE只设置了一些rownumbers,如果为null则将销售额设置为0。

;WITH SaleRows AS
(
    SELECT
        SKU,
        ISNULL(SALES, 0.0) SALES,
        CONVERT(INT, SUBSTRING(YEARWEEK, 8, 2)) Wk,
        CONVERT(INT, SUBSTRING(YEARWEEK, 3, 4)) Yr,
        ROW_NUMBER() OVER (ORDER BY YEARWEEK) RN
    FROM @SalesTape
),

第二个CTE建立在第一个CTE上并查看前两行,并将销售值放在CTE的列中

SaleRanges AS
(
    SELECT 
        SaleRows.SKU,
        SaleRows.SALES,
        SaleRows.Wk,
        SaleRows.Yr,
        SaleRows.RN,
        LAG(SALES, 2) OVER (ORDER BY RN) L2,
        LAG(SALES, 1) OVER (ORDER BY RN) L1
    FROM SaleRows 
),

现在,如果我的行和前两行都是0.0,那么标记要删除的行。 (生成句点的中断),我们将生成最新清理数据的新行号,供以后使用。

ClearContent AS
(
    SELECT *, 
        CASE WHEN L1 = 0.0 AND L2 = 0.0 AND ISNULL(SALES, 0.00) = 0.0  THEN 1 ELSE 0 END RemoveMe
    FROM SaleRanges
),
CleanedData AS
(
    SELECT 
        *, 
        ROW_NUMBER() OVER (PARTITION BY SKU ORDER BY RN) NewRn
    FROM ClearContent
    WHERE RemoveMe != 1
)

删除无效行后,我们将进行一些数学运算,查看周与行偏移量并生成逻辑周期参考。

SELECT 
    SKU,
    SALES,
    Wk,
    Yr,
    (WK - NewRn) Ref
FROM CleanedData
WHERE SALES != 0.0

这是输出:

SKU SALES   Wk  Yr  Ref
ABC 6504.000    1   2011    0
ABC 3304.230    2   2011    0
ABC 403.053 7   2011    2
ABC 3493.000    8   2011    2
ABC 3939.020    9   2011    2
DEF 4935.240    10  2011    9
DEF 3037.220    11  2011    9
DEF 392.042 15  2011    10
DEF 3493.030    18  2011    10
DEF 8644.400    19  2011    10
DEF 643.035 20  2011    10
DEF 5333.220    21  2011    10

参考显示组,因此您只需要获取每个参考的最小和最大WK以找到第一个和最后一个记录。您可以清理它并简化,但我想展示步骤。希望这会有所帮助。