GROUP BY一列;为另一个选择任意值

时间:2013-03-09 19:24:24

标签: sql sql-server

我试图为每个用户选择一行。我不关心我得到的图像。此查询适用于MySQL,但不适用于SQL Server:

SELECT user.id, (images.path + images.name) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

6 个答案:

答案 0 :(得分:12)

到目前为止,使用MIN/MAX聚合或ROW_NUMBER发布的解决方案可能效率最低(取决于数据分布),因为在为每个组选择一个之前,它们通常必须检查所有匹配的行。

使用AdventureWorks sample database进行说明,以下查询都会从每个TransactionType的“交易记录”表中选择一个ReferenceOrderIDProductID

使用MIN / MAX汇总

SELECT
    p.ProductID,
    MIN(th.TransactionType + STR(th.ReferenceOrderID, 11))
FROM Production.Product AS p
INNER JOIN Production.TransactionHistory AS th ON
    th.ProductID = p.ProductID
GROUP BY
    p.ProductID;

Aggregate query plan

使用ROW_NUMBER

WITH x AS 
(
    SELECT 
        th.ProductID, 
        th.TransactionType, 
        th.ReferenceOrderID,
        rn = ROW_NUMBER() OVER (PARTITION BY th.ProductID ORDER BY (SELECT NULL))
    FROM Production.TransactionHistory AS th
)
SELECT
    p.ProductID,
    x.TransactionType,
    x.ReferenceOrderID
FROM Production.Product AS p
INNER JOIN x ON x.ProductID = p.ProductID
WHERE
    x.rn = 1
OPTION (MAXDOP 1);

Row number plan

使用仅限内部ANY聚合

SELECT
    q.ProductID, 
    q.TransactionType, 
    q.ReferenceOrderID 
FROM 
(
    SELECT 
        p.ProductID, 
        th.TransactionType, 
        th.ReferenceOrderID,
        rn = ROW_NUMBER() OVER (
            PARTITION BY p.ProductID 
            ORDER BY p.ProductID)
    FROM Production.Product AS p
    JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID
) AS q
WHERE
    q.rn = 1;

有关ANY汇总的详细信息,请参阅this blog post

ANY aggregate

使用具有非确定性TOP

的相关子查询
SELECT p.ProductID,
    (
    -- No ORDER BY, so could be any row
    SELECT TOP (1) 
        th.TransactionType + STR( th.ReferenceOrderID, 11)
    FROM Production.TransactionHistory AS th WITH (FORCESEEK) 
    WHERE
        th.ProductID = p.ProductID
    )
FROM Production.Product AS p;

TOP 1

CROSS APPLYTOP (1)

一起使用

上一个查询需要连接,并为没有交易历史记录的产品返回NULL。将CROSS APPLYTOP一起使用可以解决这两个问题:

SELECT
    p.Name, 
    ca.TransactionType,
    ca.ReferenceOrderID
FROM Production.Product AS p
CROSS APPLY
(
    SELECT TOP (1) 
        th.TransactionType,
        th.ReferenceOrderID
    FROM Production.TransactionHistory AS th WITH (FORCESEEK) 
    WHERE 
        th.ProductID = p.ProductID
) AS ca;

CROSS APPLY plan

使用最佳索引编制,如果每个用户通常拥有许多图片,APPLY可能效率最高。

答案 1 :(得分:4)

如果用户有多个图片,而您只想要一张图片,您想要哪一张?虽然MySQL具有loosy-goosy语法,不会强迫您做出选择,只是给你任何旧的任意值,SQL Server让你选择。一种方法是MIN

SELECT u.id, MIN(i.path + i.name) AS image_path
FROM dbo.users AS u
INNER JOIN dbo.images AS i
ON u.id = i.user_id
GROUP BY u.id;

您也可以将MAX替换为MIN。并且根据SQL Server的版本,以及实际上是否需要更多列,可能还有其他方法可以更有效地执行此操作(避免某些排序/组工作)。例如,如果您想单独使用路径和名称,这将无法很好地解决:

SELECT u.id, MIN(i.path), MIN(i.name)
FROM dbo.users AS u
INNER JOIN dbo.images AS i
ON u.id = i.user_id
GROUP BY u.id;

...因为理论上你可以从两个不同的行中获取路径和名称,这个结果将不再有意义。所以你可以这样做:

;WITH x AS 
(
  SELECT user_id, path, name, rn = ROW_NUMBER() OVER 
    (PARTITION BY user_id ORDER BY (SELECT NULL))
  FROM dbo.images
)
SELECT u.id, x.path, x.name
FROM dbo.users AS u
INNER JOIN x
ON u.id = x.user_id
WHERE x.rn = 1;

在现有案例中使用此变体是否有意义在很大程度上取决于这两个表的索引方式,但您可以尝试这种方法并比较计划/性能:

;WITH x AS 
(
  SELECT user_id, path + name AS image_path, rn = ROW_NUMBER() OVER 
    (PARTITION BY user_id ORDER BY (SELECT NULL))
  FROM dbo.images
)
SELECT u.id, x.image_path
FROM dbo.users AS u
INNER JOIN x
ON u.id = x.user_id
WHERE x.rn = 1;

(并尝试将SELECT NULL替换为dbo.images中窄索引中的前导列。)

P.S。不要使用AS 'alias'语法。不推荐使用该表单,并使别名看起来像字符串文字。另外use the schema prefix always,并使用别名,因此您不必在整个查询中重复完整的表名...

答案 2 :(得分:3)

您需要一个聚合函数。 right 聚合函数与应用程序有关。这意味着你是唯一能说出来的人。一个原始的黑客:

SELECT user.id, max((images.path + images.name)) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

MySQL对GROUP BY子句的处理被广泛认为是BAD

答案 3 :(得分:2)

根据需要使用Max或Min:

SELECT user.id, max(images.path + images.name) as image_path
FROM users
      JOIN images ON images.user_id = users.id
GROUP BY users.id

答案 4 :(得分:1)

如果一个用户有多个图像,则选择第一个(按字母顺序)条目

SELECT user.id, min(images.path + images.name) as image_path
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

答案 5 :(得分:1)

使用GROUP BY时,您只能使用汇总的列,并汇总其他列的函数。

以下是实现此目的的一种方法:

SELECT user.id, (MAX(images.path) + MAX(images.name)) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

虽然你更有可能想要:

SELECT user.id, MAX(images.path + images.name)) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id