Question

我正在对一张表进行查询，该表跟踪学生所进行考试的结果。测试由多个部分组成，每个部分得分都有一列。每行都是学生考试的一个实例。这些部分既可以一次全部采用，也可以分成多次尝试。例如，学生今天可以参加一个部分，明天休息。此外，学生可以重新参加考试的任何部分。

学生样本：

StudentID   WritingSection   ReadingSection   MathSection   DateTaken
1           65               85               54            4/1/2013 14:53
1           98               NULL             NULL          4/8/2013 13:13
1           NULL             NULL             38            5/3/2013 12:43

NULL表示该部分未针对给定的测试实例进行管理，第二部分评分表示该部分已重新获得。

我想要一个按StudentID分组的查询，以便每个学生只有一行，并返回每个部分的最新分数。我正在寻找一种有效的方法来解决这个问题，因为我们在数据库中进行了数十万次测试尝试。

预期结果：

StudentID    WritingSection    ReadingSection    MathSection    DateTaken
1            98                85                38             5/3/2013 12:43

修改有很多好的解决方案。在选择答案之前，我想在下周再尝试一下。谢谢大家！

Answer 1

这很棘手。每个部分得分可能来自不同的记录。但max()和min()的常规规则不适用。

以下查询获取每个部分的序列号，从最新的非NULL值开始。然后将其用于外部查询中的条件聚合：

select s.StudentId,
       max(case when ws_seqnum = 1 then WritingSection end) as WritingSection,
       max(case when rs_seqnum = 1 then ReadingSection end) as ReadingSection,
       max(case when ms_seqnum = 1 then MathSection end) as MathSection,
       max(DateTaken) as DateTaken
from (select s.*,
             row_number() over (partition by studentid
                                order by (case when WritingSection is not null then 0 else 1 end), DateTaken desc
                               ) as ws_seqnum,
             row_number() over (partition by studentid
                                order by (case when ReadingSection is not null then 0 else 1 end), DateTaken desc
                               ) as rs_seqnum,
             row_number() over (partition by studentid
                                order by (case when MathSection is not null then 0 else 1 end), DateTaken desc
                               ) as ms_seqnum
      from student s
     ) s
where StudentId = 1
group by StudentId;

此查询中where子句是可选的。你可以删除它，它应该仍适用于所有学生。

此查询比需要的更复杂，因为数据未规范化。如果您可以控制数据结构，请考虑一个关联/联结表，每个测试每个学生一行，其中得分和测试日期为表中的列。（完全正常将为测试日期引入另一个表，但这可能不是必需的。）

Answer 2

抱歉 - 我之前的回答回答了一个不同于问题的问题:)它将返回MOST RECENT行中的所有数据。提出的问题是聚合所有行以分别获取每个主题的最新分数。

但是我把它留在那里因为我回答的问题很常见，也许有人登陆这个问题实际上有这个问题：）

现在回答实际问题：

我认为最简洁的方法是使用PIVOT和UNPIVOT：

SELECT StudentID, [WritingSection], [ReadingSection], [MathSection], MAX(DateTaken) DateTaken
FROM (
  SELECT StudentID, Subject, DateTaken, Score
  FROM (
    SELECT StudentID, Subject, DateTaken, Score
      , row_number() OVER (PARTITION BY StudentID, Subject ORDER BY DateTaken DESC) as rowNum
    FROM Students s
    UNPIVOT (
      Score FOR Subject IN ([WritingSection],[ReadingSection],[MathSection])
    ) u
  ) x
  WHERE x.rowNum = 1
) y
PIVOT (
  MAX(Score) FOR Subject IN ([WritingSection],[ReadingSection],[MathSection])
) p
GROUP BY StudentID, [WritingSection], [ReadingSection], [MathSection]

最里面的子查询（x）使用SQL的UNPIVOT函数来规范化数据（意味着将每个学生在测试的每个部分上的分数变成一行）。

下一个子查询out（y）只是将行过滤到最近得分FOR FOR EACH SUBJECT INDIVIDUALLY（这是SQL bug的一种解决方法，你不能在WHERE中使用像row_number（）这样的窗口函数子句）。

最后，由于您希望以非规范化原始格式（测试的每个部分为1列）显示数据，因此我们使用SQL的PIVOT函数。这只是将行转换为列 - 每个部分对应一个测试。最后，你说你想要显示最近的测试（尽管事实上每个部分都有自己独特的“最新”日期）。因此，我们只是汇总这3个可能不同的DateTakens来找到最新的。

如果将来添加更多Sections，这将比其他解决方案更容易扩展 - 只需将列名称添加到列表中。

Answer 3

如何将以下内容用于最大DateTaken？

从TABLE_NAME中选择最大值（DateTaken） WHERE StudentID = 1

你可以在子查询中使用它来得到像？

这样的行

SELECT TABLESection FROM TABLE_NAME WHERE StudentID = 1和DateTaken =（SELECT max（DateTaken）FROM TABLE_NAME WHERE StudentID = 1且WritingSection IS NOT NULL）

你需要为ReadingSection和MathSection再运行两次吗？

Answer 4

Joe的解决方案只会返回一个学生ID - 最新的考试。获取每个学生ID的最新日期的方法是使用分析功能。以下是您使用Oracle数据库的示例：

SELECT a.StudentID, a.DateTaken
  FROM (  SELECT StudentID,
             DateTaken,
             ROW_NUMBER ()
                OVER (PARTITION BY StudentID ORDER BY DateTaken DESC)
                rn
        FROM pto.test
    ORDER BY DateTaken DESC) a
 WHERE a.rn = 1

请注意row_number（）函数如何在每个学生ID的最后日期放置1。在外部选择中，您只需使用rn = 1过滤这些记录...仅执行内部选择以查看其工作原理。让我知道您使用什么类型的数据库来为您提供解决方案。每个数据库都有自己的分析函数实现，但逻辑是相同的......

Answer 5

这是SQL中一个非常经典的恼人问题 - 没有超级优雅的方法来做到这一点。这是我发现的最好的：

SELECT s.*
FROM Students s
JOIN (
  SELECT StudentID, MAX(DateTaken) as MaxDateTaken
  FROM Students
  GROUP BY StudentID
) f ON s.StudentID = f.StudentID AND s.DateTaken = f.MaxDateTaken

加入日期字段并不是非常理想的（如果是MAX的关系，则会中断）或快速（取决于表的索引方式）。如果你的int rowID在所有行中都是唯一的，那么最好这样做：

SELECT s.*
FROM Students s
JOIN (
  SELECT rowID
  FROM (
    SELECT StudentID, rowID, row_number() OVER (PARTITION BY StudentID ORDER BY DateTaken DESC) as rowNumber
    FROM Students
  ) x
  WHERE x.rowNumber = 1
) f ON s.rowID = f.rowID

Answer 6

SELECT student.studentid, 
       WRITE.writingsection, 
       READ.readingsection, 
       math.mathsection, 
       student.datetaken 
FROM 
-- list of students / max dates taken 
(SELECT studentid, 
        Max(datetaken) datetaken 
 FROM   test_record 
 GROUP  BY studentid) student, 
-- greatest date for student with a writingsection score (dont care what the date is here, just that the score comes from the greatest date) 
(SELECT studentid, 
        writingsection 
 FROM   test_record t 
 WHERE  writingsection IS NOT NULL 
        AND datetaken = (SELECT Max(datetaken) 
                         FROM   test_record 
                         WHERE  studentid = t.studentid 
                                AND writingsection IS NOT NULL)) WRITE, 
(SELECT studentid, 
        readingsection 
 FROM   test_record t 
 WHERE  readingsection IS NOT NULL 
        AND datetaken = (SELECT Max(datetaken) 
                         FROM   test_record 
                         WHERE  studentid = t.studentid 
                                AND readingsection IS NOT NULL)) READ, 
(SELECT studentid, 
        mathsection 
 FROM   test_record t 
 WHERE  mathsection IS NOT NULL 
        AND datetaken = (SELECT Max(datetaken) 
                         FROM   test_record 
                         WHERE  studentid = t.studentid 
                                AND mathsection IS NOT NULL)) math 
WHERE 
  -- outer join in case a student has no score recorded for one or more of the sections  
  student.studentid = READ.studentid(+) 
  AND student.studentid = WRITE.studentid(+) 
  AND student.studentid = math.studentid(+);

按列分组，选择最新值

6 个答案: