SQL - 如何根据聚合

时间:2015-04-29 23:47:06

标签: sql sql-server tsql

我一直试图将一个SQL查询放在一起几个小时,似乎无法做到正确。请考虑以下示例表ProductsProductCategories

Products
--------
ProductId   ProductName
---------   -----------
1         | Achilles
2         | Hermes
3         | Apollo
4         | Zeus
5         | Poseidon
6         | Eros

ProductCategories
-----------------
ProductId   Category
---------   --------
1         | Wars
1         | Wars|Trojan
1         | Wars|Trojans|Mortals
1         | Toys|Games
2         | Travel
2         | Travel|Trade
2         | Communication|Language|Writing
5         | Oceanware
6         | Love
6         | Love|Candy
6         | Love|Valentines
3         | Sunshine
4         | Lightning

目标是选择产品ID,产品名称和与产品关联的其中一个类别,以便每个产品ID /名称在结果中出现一次,并且所选的类别是管道字符数最多的类别在里面。如果产品的2个(或更多)类别与大多数管道相关联,则随机选择其中任何一个都可以。

换句话说,查询应该产生这个数据集:

ProductId   ProductName     Category
---------   -----------     --------
1         | Achilles      | Wars|Trojans|Mortals
2         | Hermes        | Communication|Language|Writing
3         | Apollo        | Sunshine
4         | Zeus          | Lightning
5         | Poseidon      | Oceanware
6         | Eros          | Love|Valentines

(注意,为Eros返回的类别也可以是Love | Candy,也可以接受)

截至目前,我有这个SQL,显然不起作用,因为它为每个产品/类别组合返回一行,而不仅仅是管道数最多的类别:

SELECT
    ProductId,
    ProductName,
    Category,
    MAX(PipeCount)
FROM
(
    SELECT DISTINCT
        p.ProductId AS ProductId,
        p.ProductName AS ProductName,
        c.Category AS Category,
        LEN(c.CategoryName) - LEN(REPLACE(c.CategoryName, '|', '')) AS PipeCount
    FROM
        Products p
        INNER JOIN ProductCategories c
        ON p.ProductId = c.ProductId
) Subquery
GROUP BY ProductId, ProductName, Category, PipeCount

然而,我似乎无法将此查询更接近于此。我只返回每个产品的行,其中PipeCount是产品任何行的最大PipeCount。任何帮助,将不胜感激。请注意,这不是我的实际数据;它比这复杂得多,但这个例子应该足够了。我正在研究SQL Server 2012,但希望一个好的答案几乎可以与任何版本的SQL兼容。

3 个答案:

答案 0 :(得分:3)

您可以使用ROW_NUMBER获取ProductId CategoryName次数最多的SELECT p.*, pc.CategoryName FROM Products p INNER JOIN( SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY ProductId ORDER BY LEN(CategoryName) - LEN(REPLACE(CategoryName, '|', '')) DESC) FROM ProductCategories ) pc ON pc.ProductId = p.ProductId WHERE RN = 1

SQL Fiddle

{{1}}

答案 1 :(得分:2)

此处使用ROW_NUMBER

的解决方案
--CTE as data sample for two tables
;
WITH    Products
          AS ( SELECT   *
               FROM     ( VALUES ( 1, 'Achilles'), ( 2, 'Hermes'),
                        ( 3, 'Apollo'), ( 4, 'Zeus'), ( 5, 'Poseidon'),
                        ( 6, 'Eros') ) AS t ( ProductId, ProductName )
             ),
        ProductCategories
          AS ( SELECT   *
               FROM     ( VALUES ( 1    , 'Wars'), ( 1  , 'Wars|Trojan'),
                        ( 1 , 'Wars|Trojans|Mortals'), ( 1  , 'Toys|Games'),
                        ( 2 , 'Travel'), ( 2    , 'Travel|Trade'),
                        ( 2 , 'Communication|Language|Writing'),
                        ( 5 , 'Oceanware'), ( 6 , 'Love'),
                        ( 6 , 'Love|Candy'), ( 6    , 'Love|Valentines'),
                        ( 3 , 'Sunshine'), ( 4  , 'Lightning') ) AS T ( ProductId, CategoryName )
             )

--Final Query

    SELECT  T.ProductId ,
            T.ProductName ,
            T.CategoryName
    FROM    ( SELECT    P.ProductID ,
                        P.ProductName ,
                        C.CategoryName ,
                        LEN(C.CategoryName) - LEN(REPLACE(C.CategoryName, '|', '')) AS Pipes ,
                        ROW_NUMBER() OVER ( PARTITION BY P.ProductID ORDER BY LEN(C.CategoryName)
                                            - LEN(REPLACE(C.CategoryName, '|',
                                                          '')) DESC, LEN(C.CategoryName) DESC ) AS RN
              FROM      Products AS P
                        JOIN ProductCategories AS C ON P.ProductId = C.ProductId
            ) AS T
    WHERE   T.RN = 1

答案 2 :(得分:0)

我最终使用各种子查询解决了问题。需要注意的是,它依赖于我的示例中的ProductCategories表,该表具有我未明确指定的唯一列。在我的真实数据中,此列已经存在,但面临类似问题,可以添加此类列以使此解决方案正常工作。这是SQL:

SELECT
    Sub1.ProductId,
    Sub3.Category
FROM
(
    SELECT
        o.ProductId AS ProductId,
        MAX(LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category)) AS MaxPipeCount
    FROM
        Products o
        INNER JOIN ProductCategories c
            ON o.ProductId = c.ProductId
    GROUP BY o.ProductID
) Sub1
INNER JOIN
(
    SELECT
        o.ProductId AS ProductId,
        LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category) AS PipeCount,
        MAX(c.UniqueId) AS MaxUniqueId
    FROM
        Products o
        INNER JOIN ProductCategories c
            ON o.ProductId = c.ProductId
    GROUP BY o.ProductID, LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category)
) Sub2
    ON Sub1.MaxPipeCount = Sub2.PipeCount
    AND Sub1.ProductId = Sub2.ProductId
INNER JOIN
(
    SELECT DISTINCT
        o.ProductId,
        c.Category,
        LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category) AS PipeCount,
        c.UniqueId
    FROM 
        Products o
        INNER JOIN ProductCategories c
            ON o.ProductId = c.ProductId
) Sub3
    ON Sub1.MaxPipeCount = Sub3.PipeCount
    AND Sub2.MaxUniqueId = Sub3.UniqueId
    AND Sub1.ProductId = Sub3.ProductId