我需要从子查询中提取多个列,这也需要WHERE过滤器引用FROM表的列。我有几个问题:
示例:的
在下面的例子中,我正在编写一个视图来展示测试分数,特别是发现可能需要解决或重新获得的失败。
我不能简单地使用JOIN,因为我需要首先过滤我的实际子查询(注意我的“考生”得到TOP 1,按分数或日期降序排序)
我的目标是避免重复编写(和执行)基本相同的子查询。
SELECT ExamineeID, LastName, FirstName, Email,
(SELECT COUNT(examineeTestID)
FROM exam.ExamineeTest tests
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2) Attempts,
(SELECT TOP 1 ExamineeTestID
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestExamineeTestID,
(SELECT TOP 1 Score
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestScore,
(SELECT TOP 1 DateDue
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestDateDue,
(SELECT TOP 1 TimeCommitted
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestTimeCommitted,
(SELECT TOP 1 ExamineeTestID
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentExamineeTestID,
(SELECT TOP 1 Score
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentScore,
(SELECT TOP 1 DateDue
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentDateDue,
(SELECT TOP 1 TimeCommitted
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentTimeCommitted
FROM exam.Examinee E
答案 0 :(得分:9)
首先回答你的第二个问题,是的,更好的方法是有序的,因为你正在使用的查询难以理解,难以维护,即使现在可以接受性能,查询这个问题也是一种耻辱。如果您的应用程序增长到可观的大小,那么当您不需要加上性能时,多次使用相同的表可能并不总是可以接受。
要回答您的第一个问题,我有几种方法可供您使用。除非另有说明,否则它们假设SQL 2005或更高版本。
请注意,您不需要BestExamineeID和CurrentExamineeID,因为它们将始终与ExamineeID相同,除非未进行任何测试并且它们为NULL,您可以从其他列中判断为NULL。
您可以将OUTER / CROSS APPLY视为一个运算符,它允许您将相关子查询从WHERE子句移动到JOIN子句中。它们可以具有对先前命名的表的外部引用,并且可以返回多个列。这使您每个逻辑查询只能执行一次作业,而不是每列执行一次。
SELECT
ExamineeID,
LastName,
FirstName,
Email,
B.Attempts,
BestScore = B.Score,
BestDateDue = B.DateDue,
BestTimeCommitted = B.TimeCommitted,
CurrentScore = C.Score,
CurrentDateDue = C.DateDue,
CurrentTimeCommitted = C.TimeCommitted
FROM
exam.Examinee E
OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
SELECT TOP 1
Score, DateDue, TimeCommitted,
Attempts = Count(*) OVER ()
FROM exam.ExamineeTest T
WHERE
E.ExamineeID = T.ExamineeID
AND T.TestRevisionID = 3
AND T.TestID = 2
ORDER BY Score DESC
) B
OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
SELECT TOP 1
Score, DateDue, TimeCommitted
FROM exam.ExamineeTest T
WHERE
E.ExamineeID = T.ExamineeID
AND T.TestRevisionID = 3
AND T.TestID = 2
ORDER BY DateDue DESC
) C
您应该尝试一下,看看我的Count(*) OVER ()
是否比获得额外的OUTER APPLY
更好。如果您没有从exam.Examinee
表中限制考生,那么在派生表中进行正常聚合可能会更好。
这是另一种方法(某种方式)并且一举获取所有数据。可以想象,它可以比其他查询更好地执行,除了我的经验是窗口函数在某些情况下会变得非常昂贵,所以测试是有序的。
WITH Data AS (
SELECT
*,
Count(*) OVER (PARTITION BY ExamineeID) Cnt,
Row_Number() OVER (PARTITION BY ExamineeID ORDER BY Score DESC) ScoreOrder,
Row_Number() OVER (PARTITION BY ExamineeID ORDER BY DateDue DESC) DueOrder
FROM
exam.ExamineeTest
), Vals AS (
SELECT
ExamineeID,
Max(Cnt) Attempts,
Max(CASE WHEN ScoreOrder = 1 THEN Score ELSE NULL END) BestScore,
Max(CASE WHEN ScoreOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
Max(CASE WHEN ScoreOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted,
Max(CASE WHEN DueOrder = 1 THEN Score ELSE NULL END) BestScore,
Max(CASE WHEN DueOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
Max(CASE WHEN DueOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted
FROM Data
GROUP BY
ExamineeID
)
SELECT
E.ExamineeID,
E.LastName,
E.FirstName,
E.Email,
V.Attempts,
V.BestScore, V.BestDateDue, V.BestTimeCommitted,
V.CurrentScore, V.CurrentDateDue, V.CurrentTimeCommitted
FROM
exam.Examinee E
LEFT JOIN Vals V ON E.ExamineeID = V.ExamineeID
-- change join to INNER if you only want examinees who've tested
最后,这是一个SQL 2000方法:
SELECT
E.ExamineeID,
E.LastName,
E.FirstName,
E.Email,
Y.Attempts,
Y.BestScore, Y.BestDateDue, Y.BestTimeCommitted,
Y.CurrentScore, Y.CurrentDateDue, Y.CurrentTimeCommitted
FROM
exam.Examinee E
LEFT JOIN ( -- change to inner if you only want examinees who've tested
SELECT
X.ExamineeID,
X.Cnt Attempts,
Max(CASE Y.Which WHEN 1 THEN T.Score ELSE NULL END) BestScore,
Max(CASE Y.Which WHEN 1 THEN T.DateDue ELSE NULL END) BestDateDue,
Max(CASE Y.Which WHEN 1 THEN T.TimeCommitted ELSE NULL END) BestTimeCommitted,
Max(CASE Y.Which WHEN 2 THEN T.Score ELSE NULL END) CurrentScore,
Max(CASE Y.Which WHEN 2 THEN T.DateDue ELSE NULL END) CurrentDateDue,
Max(CASE Y.Which WHEN 2 THEN T.TimeCommitted ELSE NULL END) CurrentTimeCommitted
FROM
(
SELECT ExamineeID, Max(Score) MaxScore, Max(DueDate) MaxDueDate, Count(*) Cnt
FROM exam.ExamineeTest
WHERE
TestRevisionID = 3
AND TestID = 2
GROUP BY ExamineeID
) X
CROSS JOIN (SELECT 1 UNION ALL SELECT 2) Y (Which)
INNER JOIN exam.ExamineeTest T
ON X.ExamineeID = T.ExamineeID
AND (
(Y.Which = 1 AND X.MaxScore = T.MaxScore)
OR (Y.Which = 2 AND X.MaxDueDate = T.MaxDueDate)
)
WHERE
T.TestRevisionID = 3
AND T.TestID = 2
GROUP BY
X.ExamineeID,
X.Cnt
) Y ON E.ExamineeID = Y.ExamineeID
如果(ExamineeID,Score)或(ExamineeID,DueDate)的组合可以返回多行,则此查询将返回意外的额外行。分数可能不太可能。如果两者都不是唯一的,那么您需要使用(或添加)一些可以授予唯一性的附加列,以便它可以用于选择一行。如果只有Score可以重复,那么获得最大分数的额外预查询,然后与最大DueDate相吻合将结合起来最近得分,这是最高分,同时获得最新数据。如果您需要更多SQL 2000帮助,请告诉我。
注意:控制CROSS APPLY或ROW_NUMBER()解决方案是否更好的最重要的事情是你是否有正在查找的列的索引以及数据是密集还是稀疏。
我为SQL 2000提供的解决方案组可能会执行最差,但不能保证。就像我说的那样,测试是有序的。
如果我的任何查询确实存在性能问题,请告诉我,我会看到我可以做些什么来帮助。我确定我可能有拼写错误,因为我没有使用任何DDL来重新创建你的桌子,但我没有尝试它就做到了最好。
如果性能确实变得至关重要,我将创建ExamineeTestBest和ExamineeTestCurrent表,这些表将被ExamineeTest表上的触发器推送到,该表始终会更新它们。然而,这是非规范化,可能不是必需的或者是一个好主意,除非你缩放得非常大,以至于检索结果变得无法接受的长。
答案 1 :(得分:4)
这与子查询不同。这是三个不同的子查询。
count()
TOP (1) ORDER BY Score DESC
TOP (1) ORDER BY DateDue DESC
你不能避免执行它少于3次 问题是,如何让它执行不超过3次。
一种选择是写3 inline table functions和use them with outer apply。确保它们实际上是内联的,否则你的性能会下降一百倍。这三个功能之一可能是:
create function dbo.topexaminee_byscore(@ExamineeID int)
returns table
as
return (
SELECT top (1)
ExamineeTestID as bestExamineeTestID,
Score as bestScore,
DateDue as bestDateDue,
TimeCommitted as bestTimeCommitted
FROM exam.ExamineeTest
WHERE (ExamineeID = @ExamineeID) AND (TestRevisionID = 3) AND (TestID = 2)
ORDER BY Score DESC
)
另一种选择是基本相同,但使用子查询。因为无论如何都要为所有学生提取数据,所以在性能方面不应该有太大差异。创建三个子查询,例如:
select bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted
from (
SELECT
ExamineeTestID as bestExamineeTestID,
Score as bestScore,
DateDue as bestDateDue,
TimeCommitted as bestTimeCommitted,
row_number() over (partition by ExamineeID order by Score DESC) as takeme
FROM exam.ExamineeTest
WHERE (TestRevisionID = 3) AND (TestID = 2)
) as foo
where foo.takeme = 1
ORDER BY DateDue DESC
和所有记录相同,各列为select
。
将这三个人加入考生。
什么会更好/更高效/更易读取取决于您。做一些测试。
答案 2 :(得分:1)
看起来您可以使用视图替换基于别名“bestTest”的三列。所有这三个子查询都具有相同的WHERE子句和相同的ORDER BY子句。
同样为子查询别名“bestNewTest”。同样,子查询别名为“currentTeest”。
如果我算得上,那将用3个视图替换8个子查询。您可以加入观点。我认为连接会更快,但如果我是你,我会检查两个版本的执行计划。
答案 3 :(得分:0)
您可以使用CTE
和OUTER APPLY
。
;WITH testScores AS
(
SELECT ExamineeID, ExamineeTestID, Score, DateDue, TimeCommitted
FROM exam.ExamineeTest
WHERE TestRevisionID = 3 AND TestID = 2
)
SELECT ExamineeID, LastName, FirstName, Email, total.Attempts,
bestTest.*, currentTest.*
FROM exam.Examinee
LEFT OUTER JOIN
(
SELECT ExamineeID, COUNT(ExamineeTestID) AS Attempts
FROM testScores
GROUP BY ExamineeID
) AS total ON exam.Examinee.ExamineeID = total.ExamineeID
OUTER APPLY
(
SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
FROM testScores
WHERE exam.Examinee.ExamineeID = t.ExamineeID
ORDER BY Score DESC
) AS bestTest (bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted)
OUTER APPLY
(
SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
FROM testScores
WHERE exam.Examinee.ExamineeID = t.ExamineeID
ORDER BY DateDue DESC
) AS currentTest (currentExamineeTestID, currentScore, currentDateDue,
currentTimeCommitted)