使用公共相关子查询有效地拉出不同的列

时间:2011-05-27 21:22:14

标签: sql sql-server sql-server-2008

我需要从子查询中提取多个列,这也需要WHERE过滤器引用FROM表的列。我有几个问题:

  1. 除了我的下面还有另一个解决这个问题的方法吗?
  2. 是否需要另一种解决方案,或者此解决方案是否足够有效?
  3. 示例:

    在下面的例子中,我正在编写一个视图来展示测试分数,特别是发现可能需要解决或重新获得的失败。

    我不能简单地使用JOIN,因为我需要首先过滤我的实际子查询(注意我的“考生”得到TOP 1,按分数或日期降序排序)

    我的目标是避免重复编写(和执行)基本相同的子查询。

    SELECT ExamineeID, LastName, FirstName, Email,
       (SELECT COUNT(examineeTestID)
        FROM exam.ExamineeTest tests
        WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2) Attempts,
       (SELECT TOP 1 ExamineeTestID
        FROM exam.ExamineeTest T
        WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
        ORDER BY Score DESC) bestExamineeTestID,
       (SELECT TOP 1 Score
        FROM exam.ExamineeTest T
        WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
        ORDER BY Score DESC) bestScore,
       (SELECT TOP 1 DateDue
        FROM exam.ExamineeTest T
        WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
        ORDER BY Score DESC) bestDateDue,
       (SELECT TOP 1 TimeCommitted
        FROM exam.ExamineeTest T
        WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
        ORDER BY Score DESC) bestTimeCommitted,
       (SELECT TOP 1 ExamineeTestID
        FROM exam.ExamineeTest T
        WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
        ORDER BY DateDue DESC) currentExamineeTestID,
       (SELECT TOP 1 Score
        FROM exam.ExamineeTest T
        WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
        ORDER BY DateDue DESC) currentScore,
       (SELECT TOP 1 DateDue
        FROM exam.ExamineeTest T
        WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
        ORDER BY DateDue DESC) currentDateDue,
       (SELECT TOP 1 TimeCommitted
        FROM exam.ExamineeTest T
        WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
        ORDER BY DateDue DESC) currentTimeCommitted
    FROM exam.Examinee E
    

4 个答案:

答案 0 :(得分:9)

首先回答你的第二个问题,是的,更好的方法是有序的,因为你正在使用的查询难以理解,难以维护,即使现在可以接受性能,查询这个问题也是一种耻辱。如果您的应用程序增长到可观的大小,那么当您不需要加上性能时,多次使用相同的表可能并不总是可以接受。

要回答您的第一个问题,我有几种方法可供您使用。除非另有说明,否则它们假设SQL 2005或更高版本。

请注意,您不需要BestExamineeID和CurrentExamineeID,因为它们将始终与ExamineeID相同,除非未进行任何测试并且它们为NULL,您可以从其他列中判断为NULL。

您可以将OUTER / CROSS APPLY视为一个运算符,它允许您将相关子查询从WHERE子句移动到JOIN子句中。它们可以具有对先前命名的表的外部引用,并且可以返回多个列。这使您每个逻辑查询只能执行一次作业,而不是每列执行一次。

SELECT
   ExamineeID,
   LastName,
   FirstName,
   Email,
   B.Attempts,
   BestScore = B.Score,
   BestDateDue = B.DateDue,
   BestTimeCommitted = B.TimeCommitted,
   CurrentScore = C.Score,
   CurrentDateDue = C.DateDue,
   CurrentTimeCommitted = C.TimeCommitted
FROM
   exam.Examinee E
   OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
      SELECT TOP 1
         Score, DateDue, TimeCommitted,
         Attempts = Count(*) OVER ()
      FROM exam.ExamineeTest T
      WHERE
         E.ExamineeID = T.ExamineeID
         AND T.TestRevisionID = 3
         AND T.TestID = 2
      ORDER BY Score DESC
   ) B
   OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
      SELECT TOP 1
         Score, DateDue, TimeCommitted
      FROM exam.ExamineeTest T
      WHERE
         E.ExamineeID = T.ExamineeID
         AND T.TestRevisionID = 3
         AND T.TestID = 2
      ORDER BY DateDue DESC
   ) C

您应该尝试一下,看看我的Count(*) OVER ()是否比获得额外的OUTER APPLY更好。如果您没有从exam.Examinee表中限制考生,那么在派生表中进行正常聚合可能会更好。

这是另一种方法(某种方式)并且一举获取所有数据。可以想象,它可以比其他查询更好地执行,除了我的经验是窗口函数在某些情况下会变得非常昂贵,所以测试是有序的。

WITH Data AS (
   SELECT
      *,
      Count(*) OVER (PARTITION BY ExamineeID) Cnt,
      Row_Number() OVER (PARTITION BY ExamineeID ORDER BY Score DESC) ScoreOrder,
      Row_Number() OVER (PARTITION BY ExamineeID ORDER BY DateDue DESC) DueOrder
   FROM
      exam.ExamineeTest
), Vals AS (
   SELECT
      ExamineeID,
      Max(Cnt) Attempts,
      Max(CASE WHEN ScoreOrder = 1 THEN Score ELSE NULL END) BestScore,
      Max(CASE WHEN ScoreOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
      Max(CASE WHEN ScoreOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted,
      Max(CASE WHEN DueOrder = 1 THEN Score ELSE NULL END) BestScore,
      Max(CASE WHEN DueOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
      Max(CASE WHEN DueOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted
   FROM Data
   GROUP BY
      ExamineeID
)
SELECT
   E.ExamineeID,
   E.LastName,
   E.FirstName,
   E.Email,
   V.Attempts,
   V.BestScore, V.BestDateDue, V.BestTimeCommitted,
   V.CurrentScore, V.CurrentDateDue, V.CurrentTimeCommitted
FROM
   exam.Examinee E
   LEFT JOIN Vals V ON E.ExamineeID = V.ExamineeID
   -- change join to INNER if you only want examinees who've tested

最后,这是一个SQL 2000方法:

SELECT
   E.ExamineeID,
   E.LastName,
   E.FirstName,
   E.Email,
   Y.Attempts,
   Y.BestScore, Y.BestDateDue, Y.BestTimeCommitted,
   Y.CurrentScore, Y.CurrentDateDue, Y.CurrentTimeCommitted
FROM
   exam.Examinee E
   LEFT JOIN ( -- change to inner if you only want examinees who've tested
      SELECT
         X.ExamineeID,
         X.Cnt Attempts,
         Max(CASE Y.Which WHEN 1 THEN T.Score ELSE NULL END) BestScore,
         Max(CASE Y.Which WHEN 1 THEN T.DateDue ELSE NULL END) BestDateDue,
         Max(CASE Y.Which WHEN 1 THEN T.TimeCommitted ELSE NULL END) BestTimeCommitted,
         Max(CASE Y.Which WHEN 2 THEN T.Score ELSE NULL END) CurrentScore,
         Max(CASE Y.Which WHEN 2 THEN T.DateDue ELSE NULL END) CurrentDateDue,
         Max(CASE Y.Which WHEN 2 THEN T.TimeCommitted ELSE NULL END) CurrentTimeCommitted
      FROM
         (
            SELECT ExamineeID, Max(Score) MaxScore, Max(DueDate) MaxDueDate, Count(*) Cnt
            FROM exam.ExamineeTest
            WHERE
               TestRevisionID = 3
               AND TestID = 2
            GROUP BY ExamineeID
         ) X
         CROSS JOIN (SELECT 1 UNION ALL SELECT 2) Y (Which)
         INNER JOIN exam.ExamineeTest T
            ON X.ExamineeID = T.ExamineeID
            AND (
               (Y.Which = 1 AND X.MaxScore = T.MaxScore)
               OR (Y.Which = 2 AND X.MaxDueDate = T.MaxDueDate)
            )
      WHERE
         T.TestRevisionID = 3
         AND T.TestID = 2
      GROUP BY
         X.ExamineeID,
         X.Cnt
   ) Y ON E.ExamineeID = Y.ExamineeID

如果(ExamineeID,Score)或(ExamineeID,DueDate)的组合可以返回多行,则此查询将返回意外的额外行。分数可能不太可能。如果两者都不是唯一的,那么您需要使用(或添加)一些可以授予唯一性的附加列,以便它可以用于选择一行。如果只有Score可以重复,那么获得最大分数的额外预查询,然后与最大DueDate相吻合将结合起来最近得分,这是最高分,同时获得最新数据。如果您需要更多SQL 2000帮助,请告诉我。

注意:控制CROSS APPLY或ROW_NUMBER()解决方案是否更好的最重要的事情是你是否有正在查找的列的索引以及数据是密集还是稀疏。

  • 索引+你只拉了几个考试,每个考试都有很多考试= CROSS APPLY获胜。
  • 索引+您只需要进行少量测试就可以进行大量检查= ROW_NUMBER()获胜。
  • 没有索引=字符串连接/值打包方法获胜(此处未显示)。

我为SQL 2000提供的解决方案组可能会执行最差,但不能保证。就像我说的那样,测试是有序的。

如果我的任何查询确实存在性能问题,请告诉我,我会看到我可以做些什么来帮助。我确定我可能有拼写错误,因为我没有使用任何DDL来重新创建你的桌子,但我没有尝试它就做到了最好。

如果性能确实变得至关重要,我将创建ExamineeTestBest和ExamineeTestCurrent表,这些表将被ExamineeTest表上的触发器推送到,该表始终会更新它们。然而,这是非规范化,可能不是必需的或者是一个好主意,除非你缩放得非常大,以至于检索结果变得无法接受的长。

答案 1 :(得分:4)

这与子查询不同。这是三个不同的子查询。

    所有上的
  • count()
  • TOP (1) ORDER BY Score DESC
  • TOP (1) ORDER BY DateDue DESC

你不能避免执行它少于3次 问题是,如何让它执行不超过3次。


一种选择是写3 inline table functionsuse them with outer apply。确保它们实际上是内联的,否则你的性能会下降一百倍。这三个功能之一可能是:

create function dbo.topexaminee_byscore(@ExamineeID int)
returns table
as
return (
  SELECT top (1)
    ExamineeTestID as bestExamineeTestID,
    Score as bestScore,
    DateDue as bestDateDue,
    TimeCommitted as bestTimeCommitted
  FROM exam.ExamineeTest
  WHERE (ExamineeID = @ExamineeID) AND (TestRevisionID = 3) AND (TestID = 2)
  ORDER BY Score DESC
)

另一种选择是基本相同,但使用子查询。因为无论如何都要为所有学生提取数据,所以在性能方面不应该有太大差异。创建三个子查询,例如:

select bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted
from (
  SELECT
    ExamineeTestID as bestExamineeTestID,
    Score as bestScore,
    DateDue as bestDateDue,
    TimeCommitted as bestTimeCommitted,
    row_number() over (partition by ExamineeID order by Score DESC) as takeme
  FROM exam.ExamineeTest
  WHERE (TestRevisionID = 3) AND (TestID = 2)
) as foo
where foo.takeme = 1

ORDER BY DateDue DESC和所有记录相同,各列为select

将这三个人加入考生。

什么会更好/更高效/更易读取取决于您。做一些测试。

答案 2 :(得分:1)

看起来您可以使用视图替换基于别名“bestTest”的三列。所有这三个子查询都具有相同的WHERE子句和相同的ORDER BY子句。

同样为子查询别名“bestNewTest”。同样,子查询别名为“currentTeest”。

如果我算得上,那将用3个视图替换8个子查询。您可以加入观点。我认为连接会更快,但如果我是你,我会检查两个版本的执行计划。

答案 3 :(得分:0)

您可以使用CTEOUTER APPLY

;WITH testScores AS
(
    SELECT ExamineeID, ExamineeTestID, Score, DateDue, TimeCommitted
    FROM exam.ExamineeTest
    WHERE TestRevisionID = 3 AND TestID = 2
)
SELECT ExamineeID, LastName, FirstName, Email, total.Attempts,
       bestTest.*, currentTest.*
FROM exam.Examinee
LEFT OUTER JOIN
(
    SELECT ExamineeID, COUNT(ExamineeTestID) AS Attempts
    FROM testScores
    GROUP BY ExamineeID
) AS total ON exam.Examinee.ExamineeID = total.ExamineeID
OUTER APPLY
(
    SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
    FROM testScores
    WHERE exam.Examinee.ExamineeID = t.ExamineeID
    ORDER BY Score DESC
) AS bestTest (bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted)
OUTER APPLY
(
    SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
    FROM testScores
    WHERE exam.Examinee.ExamineeID = t.ExamineeID
    ORDER BY DateDue DESC
) AS currentTest (currentExamineeTestID, currentScore, currentDateDue, 
                  currentTimeCommitted)