如何按前N个类别与“所有其他”和总数进行汇总?

时间:2018-07-10 03:49:54

标签: sql sql-server tsql sql-server-2017

我有一些按类别列出用户销售量的表(每个销售量至少有一个,可能有多个类别)。

我可以为用户获得排名最高的类别,但是我需要通过两者他/她的前N个类别以及其余类别来为用户提供统计信息。

我将问题归结为MCVE,如下所示...

MCVE Data Summary

Salesman    SaleID    Amount    Categories
--------    ------    ------    ------------------------------
     1         1         2      Service
     2         2         2      Software, Support_Contract
     2         3         3      Service
     2         4         1      Parts, Service, Software
     2         5         3      Support_Contract
     2         6         4      Promo_Gift, Support_Contract
     2         7        -2      Rebate, Support_Contract
     3         8         2      Software, Support_Contract
     3         9         3      Service
     3        10         1      Parts, Software
     3        11         3      Support_Contract
     3        12         4      Promo_Gift, Support_Contract
     3        13        -2      Rebate, Support_Contract

MCVE设置SQL:

CREATE TABLE Sales      ([Salesman] int, [SaleID] int, [Amount] int);
CREATE TABLE SalesTags  ([SaleID] int, [TagId] int);
CREATE TABLE Tags       ([TagId] int, [TagName] varchar(100) );

INSERT INTO Sales
    ([Salesman], [SaleID], [Amount])
VALUES
    (1, 1, 2),        (2, 6, 4),        (3, 10, 1),
    (2, 2, 2),        (2, 7, -2),       (3, 11, 3),
    (2, 3, 3),        (3, 8, 2),        (3, 12, 4),
    (2, 4, 1),        (3, 9, 3),        (3, 13, -2),
    (2, 5, 3)
;
INSERT INTO SalesTags
    ([SaleID], [TagId])
VALUES
    (1, 3),           (6, 4),           (10, 1),
    (2, 1),           (6, 5),           (10, 2),
    (2, 4),           (7, 4),           (11, 4),
    (3, 3),           (7, 6),           (12, 4),
    (4, 1),           (8, 1),           (12, 5),
    (4, 2),           (8, 4),           (13, 4),
    (4, 3),           (9, 3),           (13, 6),
    (5, 4)
;
INSERT INTO Tags
    ([TagId], [TagName])
VALUES
    (1, 'Software'),
    (2, 'Parts'),
    (3, 'Service'),
    (4, 'Support_Contract'),
    (5, 'Promo_Gift'),
    (6, 'Rebate')
;


参见this SQL Fiddle,我可以获得用户的前N个标签,例如:

WITH usersSales AS (  -- actual base CTE is much more complex
    SELECT  s.SaleID
            , s.Amount
    FROM    Sales s
    WHERE   s.Salesman = 2
)
SELECT Top 3  -- N can be 3 to 10
            t.TagName
            , COUNT (us.SaleID)     AS tagSales
            , SUM (us.Amount)       AS tagAmount
FROM        usersSales us
INNER JOIN  SalesTags st    ON st.SaleID = us.SaleID
INNER JOIN  Tags t          ON t.TagId   = st.TagId
GROUP BY    t.TagName
ORDER BY    tagAmount DESC
            , tagSales DESC
            , t.TagName

-显示用户的主要类别为:

  1. “支持合同”
  2. “服务”
  3. “促销礼品”

按此顺序,用于用户2。(以及Support_Contract,Promo_Gift,用户3的软件。)

但是对于N = 3,所需结果是:

  • 用户2:

    Top Category        Amount    Number of Sales
    ----------------    ------    ---------------
    Support Contract       7             4
    Service                4             2
    Promo Gift             0             0
    - All Others -         0             0
    ============================================
    Totals                11             6
    
  • 用户3:

    Top Category        Amount    Number of Sales
    ----------------    ------    ---------------
    Support Contract       7             4
    Promo_Gift             0             0
    Software               1             1
    - All Others -         3             1
    ============================================
    Totals                11             6
    

位置:

  1. 最高类别是用户在给定销售中排名最高的类别(根据上述查询)。
  2. 第2行的 Top Category 不包括在第1行中已占销售额的行列。
  3. 第3行的 Top Category 不包括已经在第1行和第2行中说明的销售额。
  4. 等等。
  5. 所有剩余的销售(未计入前N个类别)均归入- All Others -组。
  6. 底部的总数与用户的总体销售数字相符。

如何汇总这样的结果?

请注意,它正在MS SQL-Server 2017上运行,我无法更改表架构。

2 个答案:

答案 0 :(得分:4)

这是一种方法。 逐步运行查询,按CTE进行查询,并检查中间结果以了解其工作原理。

这不是最有效的方法,因为我最终将桌子与自己连接起来以消除之前汇总的销售额,但是目前我不知道该如何避免。

WITH usersSales 
AS 
(  -- actual base CTE is much more complex
    SELECT
        s.SaleID
        , s.Amount
    FROM Sales s
    WHERE s.Salesman = 2
)
,CTE_Sums
AS
(
    SELECT
        t.TagName
        ,us.Amount
        ,us.SaleID
        ,SUM(us.Amount) OVER (PARTITION BY t.TagName) AS TagAmount
        ,COUNT(*) OVER (PARTITION BY t.TagName) AS TagSales
    FROM
        usersSales us
        INNER JOIN SalesTags st ON st.SaleID = us.SaleID
        INNER JOIN Tags t ON t.TagId = st.TagId
)
,CTE_Rank
AS
(
    SELECT
        TagName
        ,Amount
        ,SaleID
        ,TagAmount
        ,TagSales
        ,DENSE_RANK() OVER (ORDER BY TagAmount DESC, TagSales DESC, TagName) AS rnk
    FROM CTE_Sums
)
,CTE_Final
AS
(
    SELECT
        Main.TagName
        ,Main.Amount
        ,Main.SaleID
        ,Main.TagAmount
        ,Main.TagSales
        ,Main.rnk
        ,ISNULL(A.FinalTagAmount, 0) AS FinalTagAmount
        ,A.FinalTagSales
    FROM
        CTE_Rank AS Main
        OUTER APPLY
        (
            SELECT
                SUM(Detail.Amount) AS FinalTagAmount
                ,COUNT(*) AS FinalTagSales
            FROM CTE_Rank AS Detail
            WHERE
                Detail.rnk = Main.rnk
                AND Detail.SaleID NOT IN
                (
                    SELECT PrevRanks.SaleID
                    FROM CTE_Rank AS PrevRanks
                    WHERE PrevRanks.rnk < Detail.rnk
                )
        ) AS A
)
SELECT
    TagName
    ,MIN(FinalTagAmount) AS FinalTagAmount
    ,MIN(FinalTagSales) AS FinalTagSales
    ,rnk
    ,0 AS SortOrder
FROM CTE_Final
WHERE rnk <= 3
GROUP BY
    TagName
    ,rnk

UNION ALL

SELECT
    '- All Others -' AS TagName
    ,SUM(FinalTagAmount) AS FinalTagAmount
    ,SUM(FinalTagSales) AS FinalTagSales
    ,0 AS rnk
    ,1 AS SortOrder
FROM CTE_Final
WHERE rnk > 3

ORDER BY
    SortOrder
    ,rnk
;

CTE_Rank

还不对行进行分组和汇总,而是使用窗口聚合来获取每个标签的排名。以后,我们将需要单独的行(SaleID)和单独的数量来过滤正在使用的行。

+------------------+--------+--------+-----------+----------+-----+
|     TagName      | Amount | SaleID | TagAmount | TagSales | rnk |
+------------------+--------+--------+-----------+----------+-----+
| Support Contract |     -2 |      7 |         7 |        4 |   1 |
| Support Contract |      3 |      5 |         7 |        4 |   1 |
| Support Contract |      4 |      6 |         7 |        4 |   1 |
| Support Contract |      2 |      2 |         7 |        4 |   1 |
| Service          |      1 |      4 |         4 |        2 |   2 |
| Service          |      3 |      3 |         4 |        2 |   2 |
| Promo Gift       |      4 |      6 |         4 |        1 |   3 |
| Software         |      1 |      4 |         3 |        2 |   4 |
| Software         |      2 |      2 |         3 |        2 |   4 |
| Parts            |      1 |      4 |         1 |        1 |   5 |
| Rebate           |     -2 |      7 |        -2 |        1 |   6 |
+------------------+--------+--------+-----------+----------+-----+

CTE_Final

OUTER APPLY的主要计算方法是过滤排名较高的代码中遇到的销售额。

+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
|     TagName      | Amount | SaleID | TagAmount | TagSales | rnk | FinalTagAmount | FinalTagSales |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| Support Contract |     -2 |      7 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      3 |      5 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      4 |      6 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      2 |      2 |         7 |        4 |   1 |              7 |             4 |
| Service          |      1 |      4 |         4 |        2 |   2 |              4 |             2 |
| Service          |      3 |      3 |         4 |        2 |   2 |              4 |             2 |
| Promo Gift       |      4 |      6 |         4 |        1 |   3 |              0 |             0 |
| Software         |      1 |      4 |         3 |        2 |   4 |              0 |             0 |
| Software         |      2 |      2 |         3 |        2 |   4 |              0 |             0 |
| Parts            |      1 |      4 |         1 |        1 |   5 |              0 |             0 |
| Rebate           |     -2 |      7 |        -2 |        1 |   6 |              0 |             0 |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+

查询结果

只需将排名前3位的标签以及所有其余标签放在一起。

+------------------+----------------+---------------+-----+-----------+
|     TagName      | FinalTagAmount | FinalTagSales | rnk | SortOrder |
+------------------+----------------+---------------+-----+-----------+
| Support Contract |              7 |             4 |   1 |         0 |
| Service          |              4 |             2 |   2 |         0 |
| Promo Gift       |              0 |             0 |   3 |         0 |
| - All Others -   |              0 |             0 |   0 |         1 |
+------------------+----------------+---------------+-----+-----------+

答案 1 :(得分:1)

下面的方法使用几个本地临时表逐步构建解决方案。这样可以最大程度地减少对基表的访问,提供更多的索引机会,并为查询优化器提供更好的统计信息。

设置

-- Parameters
DECLARE
    @PersonId integer = 3,
    @TopN bigint = 3;

-- Holds sales data extract for @PersonId
CREATE TABLE #Sales
(
    SaleID integer NOT NULL,
    Amount integer NOT NULL,
    TagName varchar(100) NOT NULL,

    PRIMARY KEY (TagName, SaleID)
);

-- Computed totals (for final output)
CREATE TABLE #TagTotals
(
    Position integer IDENTITY (1, 1) NOT NULL PRIMARY KEY,
    TagName varchar(100) NULL UNIQUE,
    NumSales bigint NOT NULL,
    SumSales integer NOT NULL,
);

数据加载

-- Fetch sales data for the @PersonId once
INSERT #Sales
(
    SaleID,
    Amount,
    TagName
)
SELECT
    S.SaleID,
    S.Amount,
    T.TagName
FROM dbo.Sales AS S 
JOIN dbo.SalesTags AS ST
    ON ST.SaleID = S.SaleID
JOIN dbo.Tags AS T
    ON T.TagId = ST.TagId
WHERE 
    S.Salesman = @PersonId;

Sales data extract

-- Find the @TopN top categories
INSERT #TagTotals
(
    TagName,
    NumSales,
    SumSales
)
SELECT
    S.TagName, 
    NumSales = COUNT_BIG(*), 
    SumSales = SUM(S.Amount)
FROM #Sales AS S
GROUP BY 
    S.TagName
ORDER BY 
    SumSales DESC, 
    NumSales DESC, 
    S.TagName ASC
OFFSET 0 ROWS 
FETCH FIRST @TopN ROWS ONLY;

@TopN top categories

计算具有依赖性的总数

-- Recalculate totals for categories with dependencies
UPDATE TT
SET NumSales = TagSales.NumSales, 
    SumSales = ISNULL(TagSales.SumSales, 0)
FROM #TagTotals AS TT
CROSS APPLY
(
    SELECT 
        NumSales = COUNT_BIG(*), 
        SumSales = SUM(S.Amount)
    FROM #Sales AS S
    WHERE
        -- For the current tag
        S.TagName = TT.TagName
        -- Exclude sales covered by previous tags
        AND S.SaleID NOT IN
        (
            SELECT
                S2.SaleID
            FROM #TagTotals AS PreviousTags
            JOIN #Sales AS S2
                ON S2.TagName = PreviousTags.TagName
            WHERE
                PreviousTags.Position < TT.Position
        )
) AS TagSales
-- First category has no exclusions to handle
WHERE
    TT.Position > 1;

Compute totals with dependencies

添加其他人

-- Add '- All Others -' category
INSERT #TagTotals
(
    TagName,
    NumSales,
    SumSales
)
SELECT 
    '- All Others -', 
    NumSales = COUNT_BIG(*), 
    SumSales = ISNULL(SUM(S.Amount), 0)
FROM #Sales AS S
WHERE S.SaleID NOT IN
(
    -- Sales already accounted for
    SELECT
        S2.SaleID
    FROM #TagTotals AS O
    JOIN #Sales AS S2
        ON S2.TagName = O.TagName
);

All Others

总计

-- Add grand total
INSERT #TagTotals
(
    TagName,
    NumSales,
    SumSales
)
SELECT 
    'Totals', 
    NumSales = ISNULL(SUM(O.NumSales), 0), 
    SumSales = ISNULL(SUM(O.SumSales), 0)
FROM #TagTotals AS O;

Grand total

最终输出

-- Final output
SELECT
    [Top Category] = O.TagName,
    [Amount] = O.SumSales,
    [Number of Sales] = O.NumSales 
FROM #TagTotals AS O 
ORDER BY
    O.Position ASC;

Final output

@PersonId = 2的结果:

╔══════════════════╦════════╦═════════════════╗
║   Top Category   ║ Amount ║ Number of Sales ║
╠══════════════════╬════════╬═════════════════╣
║ Support_Contract ║      7 ║               4 ║
║ Service          ║      4 ║               2 ║
║ Promo_Gift       ║      0 ║               0 ║
║ - All Others -   ║      0 ║               0 ║
║ Totals           ║     11 ║               6 ║
╚══════════════════╩════════╩═════════════════╝

@PersonId = 3的结果:

╔══════════════════╦════════╦═════════════════╗
║   Top Category   ║ Amount ║ Number of Sales ║
╠══════════════════╬════════╬═════════════════╣
║ Support_Contract ║      7 ║               4 ║
║ Promo_Gift       ║      0 ║               0 ║
║ Software         ║      1 ║               1 ║
║ - All Others -   ║      3 ║               1 ║
║ Totals           ║     11 ║               6 ║
╚══════════════════╩════════╩═════════════════╝

演示db<>fiddle