我一直试图将一个SQL查询放在一起几个小时,似乎无法做到正确。请考虑以下示例表Products
和ProductCategories
:
Products
--------
ProductId ProductName
--------- -----------
1 | Achilles
2 | Hermes
3 | Apollo
4 | Zeus
5 | Poseidon
6 | Eros
ProductCategories
-----------------
ProductId Category
--------- --------
1 | Wars
1 | Wars|Trojan
1 | Wars|Trojans|Mortals
1 | Toys|Games
2 | Travel
2 | Travel|Trade
2 | Communication|Language|Writing
5 | Oceanware
6 | Love
6 | Love|Candy
6 | Love|Valentines
3 | Sunshine
4 | Lightning
目标是选择产品ID,产品名称和与产品关联的其中一个类别,以便每个产品ID /名称在结果中出现一次,并且所选的类别是管道字符数最多的类别在里面。如果产品的2个(或更多)类别与大多数管道相关联,则随机选择其中任何一个都可以。
换句话说,查询应该产生这个数据集:
ProductId ProductName Category
--------- ----------- --------
1 | Achilles | Wars|Trojans|Mortals
2 | Hermes | Communication|Language|Writing
3 | Apollo | Sunshine
4 | Zeus | Lightning
5 | Poseidon | Oceanware
6 | Eros | Love|Valentines
(注意,为Eros返回的类别也可以是Love | Candy,也可以接受)
截至目前,我有这个SQL,显然不起作用,因为它为每个产品/类别组合返回一行,而不仅仅是管道数最多的类别:
SELECT
ProductId,
ProductName,
Category,
MAX(PipeCount)
FROM
(
SELECT DISTINCT
p.ProductId AS ProductId,
p.ProductName AS ProductName,
c.Category AS Category,
LEN(c.CategoryName) - LEN(REPLACE(c.CategoryName, '|', '')) AS PipeCount
FROM
Products p
INNER JOIN ProductCategories c
ON p.ProductId = c.ProductId
) Subquery
GROUP BY ProductId, ProductName, Category, PipeCount
然而,我似乎无法将此查询更接近于此。我只返回每个产品的行,其中PipeCount是产品任何行的最大PipeCount。任何帮助,将不胜感激。请注意,这不是我的实际数据;它比这复杂得多,但这个例子应该足够了。我正在研究SQL Server 2012,但希望一个好的答案几乎可以与任何版本的SQL兼容。
答案 0 :(得分:3)
您可以使用ROW_NUMBER
获取ProductId
CategoryName
次数最多的SELECT
p.*,
pc.CategoryName
FROM Products p
INNER JOIN(
SELECT
*,
RN = ROW_NUMBER() OVER(PARTITION BY ProductId ORDER BY LEN(CategoryName) - LEN(REPLACE(CategoryName, '|', '')) DESC)
FROM ProductCategories
) pc
ON pc.ProductId = p.ProductId
WHERE RN = 1
:
{{1}}
答案 1 :(得分:2)
此处使用ROW_NUMBER
--CTE as data sample for two tables
;
WITH Products
AS ( SELECT *
FROM ( VALUES ( 1, 'Achilles'), ( 2, 'Hermes'),
( 3, 'Apollo'), ( 4, 'Zeus'), ( 5, 'Poseidon'),
( 6, 'Eros') ) AS t ( ProductId, ProductName )
),
ProductCategories
AS ( SELECT *
FROM ( VALUES ( 1 , 'Wars'), ( 1 , 'Wars|Trojan'),
( 1 , 'Wars|Trojans|Mortals'), ( 1 , 'Toys|Games'),
( 2 , 'Travel'), ( 2 , 'Travel|Trade'),
( 2 , 'Communication|Language|Writing'),
( 5 , 'Oceanware'), ( 6 , 'Love'),
( 6 , 'Love|Candy'), ( 6 , 'Love|Valentines'),
( 3 , 'Sunshine'), ( 4 , 'Lightning') ) AS T ( ProductId, CategoryName )
)
--Final Query
SELECT T.ProductId ,
T.ProductName ,
T.CategoryName
FROM ( SELECT P.ProductID ,
P.ProductName ,
C.CategoryName ,
LEN(C.CategoryName) - LEN(REPLACE(C.CategoryName, '|', '')) AS Pipes ,
ROW_NUMBER() OVER ( PARTITION BY P.ProductID ORDER BY LEN(C.CategoryName)
- LEN(REPLACE(C.CategoryName, '|',
'')) DESC, LEN(C.CategoryName) DESC ) AS RN
FROM Products AS P
JOIN ProductCategories AS C ON P.ProductId = C.ProductId
) AS T
WHERE T.RN = 1
答案 2 :(得分:0)
我最终使用各种子查询解决了问题。需要注意的是,它依赖于我的示例中的ProductCategories
表,该表具有我未明确指定的唯一列。在我的真实数据中,此列已经存在,但面临类似问题,可以添加此类列以使此解决方案正常工作。这是SQL:
SELECT
Sub1.ProductId,
Sub3.Category
FROM
(
SELECT
o.ProductId AS ProductId,
MAX(LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category)) AS MaxPipeCount
FROM
Products o
INNER JOIN ProductCategories c
ON o.ProductId = c.ProductId
GROUP BY o.ProductID
) Sub1
INNER JOIN
(
SELECT
o.ProductId AS ProductId,
LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category) AS PipeCount,
MAX(c.UniqueId) AS MaxUniqueId
FROM
Products o
INNER JOIN ProductCategories c
ON o.ProductId = c.ProductId
GROUP BY o.ProductID, LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category)
) Sub2
ON Sub1.MaxPipeCount = Sub2.PipeCount
AND Sub1.ProductId = Sub2.ProductId
INNER JOIN
(
SELECT DISTINCT
o.ProductId,
c.Category,
LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category) AS PipeCount,
c.UniqueId
FROM
Products o
INNER JOIN ProductCategories c
ON o.ProductId = c.ProductId
) Sub3
ON Sub1.MaxPipeCount = Sub3.PipeCount
AND Sub2.MaxUniqueId = Sub3.UniqueId
AND Sub1.ProductId = Sub3.ProductId