我有一些按类别列出用户销售量的表(每个销售量至少有一个,可能有多个类别)。
我可以为用户获得排名最高的类别,但是我需要通过两者他/她的前N个类别以及其余类别来为用户提供统计信息。
我将问题归结为MCVE,如下所示...
MCVE Data Summary:
Salesman SaleID Amount Categories -------- ------ ------ ------------------------------ 1 1 2 Service 2 2 2 Software, Support_Contract 2 3 3 Service 2 4 1 Parts, Service, Software 2 5 3 Support_Contract 2 6 4 Promo_Gift, Support_Contract 2 7 -2 Rebate, Support_Contract 3 8 2 Software, Support_Contract 3 9 3 Service 3 10 1 Parts, Software 3 11 3 Support_Contract 3 12 4 Promo_Gift, Support_Contract 3 13 -2 Rebate, Support_Contract
MCVE设置SQL:
CREATE TABLE Sales ([Salesman] int, [SaleID] int, [Amount] int);
CREATE TABLE SalesTags ([SaleID] int, [TagId] int);
CREATE TABLE Tags ([TagId] int, [TagName] varchar(100) );
INSERT INTO Sales
([Salesman], [SaleID], [Amount])
VALUES
(1, 1, 2), (2, 6, 4), (3, 10, 1),
(2, 2, 2), (2, 7, -2), (3, 11, 3),
(2, 3, 3), (3, 8, 2), (3, 12, 4),
(2, 4, 1), (3, 9, 3), (3, 13, -2),
(2, 5, 3)
;
INSERT INTO SalesTags
([SaleID], [TagId])
VALUES
(1, 3), (6, 4), (10, 1),
(2, 1), (6, 5), (10, 2),
(2, 4), (7, 4), (11, 4),
(3, 3), (7, 6), (12, 4),
(4, 1), (8, 1), (12, 5),
(4, 2), (8, 4), (13, 4),
(4, 3), (9, 3), (13, 6),
(5, 4)
;
INSERT INTO Tags
([TagId], [TagName])
VALUES
(1, 'Software'),
(2, 'Parts'),
(3, 'Service'),
(4, 'Support_Contract'),
(5, 'Promo_Gift'),
(6, 'Rebate')
;
参见this SQL Fiddle,我可以获得用户的前N个标签,例如:
WITH usersSales AS ( -- actual base CTE is much more complex
SELECT s.SaleID
, s.Amount
FROM Sales s
WHERE s.Salesman = 2
)
SELECT Top 3 -- N can be 3 to 10
t.TagName
, COUNT (us.SaleID) AS tagSales
, SUM (us.Amount) AS tagAmount
FROM usersSales us
INNER JOIN SalesTags st ON st.SaleID = us.SaleID
INNER JOIN Tags t ON t.TagId = st.TagId
GROUP BY t.TagName
ORDER BY tagAmount DESC
, tagSales DESC
, t.TagName
-显示用户的主要类别为:
按此顺序,用于用户2。(以及Support_Contract,Promo_Gift,用户3的软件。)
但是对于N = 3,所需结果是:
用户2:
Top Category Amount Number of Sales
---------------- ------ ---------------
Support Contract 7 4
Service 4 2
Promo Gift 0 0
- All Others - 0 0
============================================
Totals 11 6
用户3:
Top Category Amount Number of Sales
---------------- ------ ---------------
Support Contract 7 4
Promo_Gift 0 0
Software 1 1
- All Others - 3 1
============================================
Totals 11 6
位置:
- All Others -
组。如何汇总这样的结果?
请注意,它正在MS SQL-Server 2017上运行,我无法更改表架构。
答案 0 :(得分:4)
这是一种方法。 逐步运行查询,按CTE进行查询,并检查中间结果以了解其工作原理。
这不是最有效的方法,因为我最终将桌子与自己连接起来以消除之前汇总的销售额,但是目前我不知道该如何避免。
WITH usersSales
AS
( -- actual base CTE is much more complex
SELECT
s.SaleID
, s.Amount
FROM Sales s
WHERE s.Salesman = 2
)
,CTE_Sums
AS
(
SELECT
t.TagName
,us.Amount
,us.SaleID
,SUM(us.Amount) OVER (PARTITION BY t.TagName) AS TagAmount
,COUNT(*) OVER (PARTITION BY t.TagName) AS TagSales
FROM
usersSales us
INNER JOIN SalesTags st ON st.SaleID = us.SaleID
INNER JOIN Tags t ON t.TagId = st.TagId
)
,CTE_Rank
AS
(
SELECT
TagName
,Amount
,SaleID
,TagAmount
,TagSales
,DENSE_RANK() OVER (ORDER BY TagAmount DESC, TagSales DESC, TagName) AS rnk
FROM CTE_Sums
)
,CTE_Final
AS
(
SELECT
Main.TagName
,Main.Amount
,Main.SaleID
,Main.TagAmount
,Main.TagSales
,Main.rnk
,ISNULL(A.FinalTagAmount, 0) AS FinalTagAmount
,A.FinalTagSales
FROM
CTE_Rank AS Main
OUTER APPLY
(
SELECT
SUM(Detail.Amount) AS FinalTagAmount
,COUNT(*) AS FinalTagSales
FROM CTE_Rank AS Detail
WHERE
Detail.rnk = Main.rnk
AND Detail.SaleID NOT IN
(
SELECT PrevRanks.SaleID
FROM CTE_Rank AS PrevRanks
WHERE PrevRanks.rnk < Detail.rnk
)
) AS A
)
SELECT
TagName
,MIN(FinalTagAmount) AS FinalTagAmount
,MIN(FinalTagSales) AS FinalTagSales
,rnk
,0 AS SortOrder
FROM CTE_Final
WHERE rnk <= 3
GROUP BY
TagName
,rnk
UNION ALL
SELECT
'- All Others -' AS TagName
,SUM(FinalTagAmount) AS FinalTagAmount
,SUM(FinalTagSales) AS FinalTagSales
,0 AS rnk
,1 AS SortOrder
FROM CTE_Final
WHERE rnk > 3
ORDER BY
SortOrder
,rnk
;
CTE_Rank
还不对行进行分组和汇总,而是使用窗口聚合来获取每个标签的排名。以后,我们将需要单独的行(SaleID
)和单独的数量来过滤正在使用的行。
+------------------+--------+--------+-----------+----------+-----+
| TagName | Amount | SaleID | TagAmount | TagSales | rnk |
+------------------+--------+--------+-----------+----------+-----+
| Support Contract | -2 | 7 | 7 | 4 | 1 |
| Support Contract | 3 | 5 | 7 | 4 | 1 |
| Support Contract | 4 | 6 | 7 | 4 | 1 |
| Support Contract | 2 | 2 | 7 | 4 | 1 |
| Service | 1 | 4 | 4 | 2 | 2 |
| Service | 3 | 3 | 4 | 2 | 2 |
| Promo Gift | 4 | 6 | 4 | 1 | 3 |
| Software | 1 | 4 | 3 | 2 | 4 |
| Software | 2 | 2 | 3 | 2 | 4 |
| Parts | 1 | 4 | 1 | 1 | 5 |
| Rebate | -2 | 7 | -2 | 1 | 6 |
+------------------+--------+--------+-----------+----------+-----+
CTE_Final
OUTER APPLY
的主要计算方法是过滤排名较高的代码中遇到的销售额。
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| TagName | Amount | SaleID | TagAmount | TagSales | rnk | FinalTagAmount | FinalTagSales |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| Support Contract | -2 | 7 | 7 | 4 | 1 | 7 | 4 |
| Support Contract | 3 | 5 | 7 | 4 | 1 | 7 | 4 |
| Support Contract | 4 | 6 | 7 | 4 | 1 | 7 | 4 |
| Support Contract | 2 | 2 | 7 | 4 | 1 | 7 | 4 |
| Service | 1 | 4 | 4 | 2 | 2 | 4 | 2 |
| Service | 3 | 3 | 4 | 2 | 2 | 4 | 2 |
| Promo Gift | 4 | 6 | 4 | 1 | 3 | 0 | 0 |
| Software | 1 | 4 | 3 | 2 | 4 | 0 | 0 |
| Software | 2 | 2 | 3 | 2 | 4 | 0 | 0 |
| Parts | 1 | 4 | 1 | 1 | 5 | 0 | 0 |
| Rebate | -2 | 7 | -2 | 1 | 6 | 0 | 0 |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
查询结果
只需将排名前3位的标签以及所有其余标签放在一起。
+------------------+----------------+---------------+-----+-----------+
| TagName | FinalTagAmount | FinalTagSales | rnk | SortOrder |
+------------------+----------------+---------------+-----+-----------+
| Support Contract | 7 | 4 | 1 | 0 |
| Service | 4 | 2 | 2 | 0 |
| Promo Gift | 0 | 0 | 3 | 0 |
| - All Others - | 0 | 0 | 0 | 1 |
+------------------+----------------+---------------+-----+-----------+
答案 1 :(得分:1)
下面的方法使用几个本地临时表逐步构建解决方案。这样可以最大程度地减少对基表的访问,提供更多的索引机会,并为查询优化器提供更好的统计信息。
-- Parameters
DECLARE
@PersonId integer = 3,
@TopN bigint = 3;
-- Holds sales data extract for @PersonId
CREATE TABLE #Sales
(
SaleID integer NOT NULL,
Amount integer NOT NULL,
TagName varchar(100) NOT NULL,
PRIMARY KEY (TagName, SaleID)
);
-- Computed totals (for final output)
CREATE TABLE #TagTotals
(
Position integer IDENTITY (1, 1) NOT NULL PRIMARY KEY,
TagName varchar(100) NULL UNIQUE,
NumSales bigint NOT NULL,
SumSales integer NOT NULL,
);
-- Fetch sales data for the @PersonId once
INSERT #Sales
(
SaleID,
Amount,
TagName
)
SELECT
S.SaleID,
S.Amount,
T.TagName
FROM dbo.Sales AS S
JOIN dbo.SalesTags AS ST
ON ST.SaleID = S.SaleID
JOIN dbo.Tags AS T
ON T.TagId = ST.TagId
WHERE
S.Salesman = @PersonId;
-- Find the @TopN top categories
INSERT #TagTotals
(
TagName,
NumSales,
SumSales
)
SELECT
S.TagName,
NumSales = COUNT_BIG(*),
SumSales = SUM(S.Amount)
FROM #Sales AS S
GROUP BY
S.TagName
ORDER BY
SumSales DESC,
NumSales DESC,
S.TagName ASC
OFFSET 0 ROWS
FETCH FIRST @TopN ROWS ONLY;
-- Recalculate totals for categories with dependencies
UPDATE TT
SET NumSales = TagSales.NumSales,
SumSales = ISNULL(TagSales.SumSales, 0)
FROM #TagTotals AS TT
CROSS APPLY
(
SELECT
NumSales = COUNT_BIG(*),
SumSales = SUM(S.Amount)
FROM #Sales AS S
WHERE
-- For the current tag
S.TagName = TT.TagName
-- Exclude sales covered by previous tags
AND S.SaleID NOT IN
(
SELECT
S2.SaleID
FROM #TagTotals AS PreviousTags
JOIN #Sales AS S2
ON S2.TagName = PreviousTags.TagName
WHERE
PreviousTags.Position < TT.Position
)
) AS TagSales
-- First category has no exclusions to handle
WHERE
TT.Position > 1;
-- Add '- All Others -' category
INSERT #TagTotals
(
TagName,
NumSales,
SumSales
)
SELECT
'- All Others -',
NumSales = COUNT_BIG(*),
SumSales = ISNULL(SUM(S.Amount), 0)
FROM #Sales AS S
WHERE S.SaleID NOT IN
(
-- Sales already accounted for
SELECT
S2.SaleID
FROM #TagTotals AS O
JOIN #Sales AS S2
ON S2.TagName = O.TagName
);
-- Add grand total
INSERT #TagTotals
(
TagName,
NumSales,
SumSales
)
SELECT
'Totals',
NumSales = ISNULL(SUM(O.NumSales), 0),
SumSales = ISNULL(SUM(O.SumSales), 0)
FROM #TagTotals AS O;
-- Final output
SELECT
[Top Category] = O.TagName,
[Amount] = O.SumSales,
[Number of Sales] = O.NumSales
FROM #TagTotals AS O
ORDER BY
O.Position ASC;
@PersonId = 2
的结果:
╔══════════════════╦════════╦═════════════════╗ ║ Top Category ║ Amount ║ Number of Sales ║ ╠══════════════════╬════════╬═════════════════╣ ║ Support_Contract ║ 7 ║ 4 ║ ║ Service ║ 4 ║ 2 ║ ║ Promo_Gift ║ 0 ║ 0 ║ ║ - All Others - ║ 0 ║ 0 ║ ║ Totals ║ 11 ║ 6 ║ ╚══════════════════╩════════╩═════════════════╝
@PersonId = 3
的结果:
╔══════════════════╦════════╦═════════════════╗ ║ Top Category ║ Amount ║ Number of Sales ║ ╠══════════════════╬════════╬═════════════════╣ ║ Support_Contract ║ 7 ║ 4 ║ ║ Promo_Gift ║ 0 ║ 0 ║ ║ Software ║ 1 ║ 1 ║ ║ - All Others - ║ 3 ║ 1 ║ ║ Totals ║ 11 ║ 6 ║ ╚══════════════════╩════════╩═════════════════╝
演示在db<>fiddle