没有正确的关系排序

时间:2013-01-22 22:53:16

标签: sql

我正在尝试使用row_number来计算箱形图的中位数,下四分位数和上四分位数。但是,由于关系,我的row_number排序已关闭。

以下是一些示例数据:

CREATE TABLE EStats    
(
    PersonID            VARCHAR(30)     NOT NULL,
    Grade               VARCHAR(25)     NOT NULL,
    CourseDate          Date            NOT NULL
);

INSERT INTO EStats
(
    PersonID, Grade, CourseDate
)

VALUES
    ('100', '91', '2010-03-01'),
    ('101', '96', '2010-03-01'),
    ('102', '88', '2010-03-01'),
    ('103', '92', '2010-03-01'),
    ('104', '81', '2010-03-01'),
    ('105', '85', '2010-03-01'),
    ('106', '91', '2010-03-01'),
    ('107', '89', '2010-03-01'),
    ('108', '99', '2010-03-01'),
    ('109', '88', '2010-03-01'),
    ('110', '81', '2011-03-02'),
    ('111', '77', '2011-03-02'),
    ('112', '88', '2011-03-02'),
    ('113', '76', '2011-03-02'),
    ('114', '69', '2011-03-02'),
    ('115', '70', '2011-03-02'),
    ('116', '75', '2011-03-02'),
    ('117', '88', '2011-03-02'),
    ('118', '76', '2011-03-02'),
    ('119', '95', '2012-03-01'),
    ('120', '96', '2012-03-01'),
    ('121', '90', '2012-03-01'),
    ('122', '80', '2012-03-01'),
    ('123', '85', '2012-03-01'),
    ('124', '94', '2012-03-01'),
    ('125', '89', '2012-03-01'),
    ('126', '97', '2012-03-01'),
    ('127', '94', '2012-03-01'),
    ('128', '72', '2012-03-01'),
    ('129', '88', '2012-03-01'),
    ('130', '91', '2012-03-01')

以下是我的一个内部查询,显示排序无效:

SELECT
    CourseDate,
    Grade,
    ROW_NUMBER() OVER (
        PARTITION BY LEFT(CourseDate, 4)
        ORDER BY Grade ASC) AS RowAsc,
    ROW_NUMBER() OVER (
        PARTITION BY LEFT(CourseDate, 4)
        ORDER BY Grade DESC) AS RowDesc
FROM EStats

请注意,对于CourseDate 2010-03-01,RowAsc执行此操作:

10
9
8
6
7
5
3
4
2
1

但是,我需要按顺序排列所有行,以便在存在偶数数量的情况下计算中位数。 (Rank和dense_rank由于它们留下的“空隙”而无效。

实际上,下面是整个事情。同样,我正在尝试计算blox图表的中位数,下四分位数,上四分位数,最小值和最大值。非常感谢任何帮助!

WITH Q3 AS
(
    SELECT
        CourseDate,
        AVG(CAST(Grade AS Numeric)) AS Median

    FROM
    (
        SELECT
            CourseDate,
            Grade,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(CourseDate, 4)
                ORDER BY Grade ASC) AS RowAsc,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(CourseDate, 4)
                ORDER BY Grade DESC) AS RowDesc
        FROM EStats
    )x
    WHERE 
        RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
    GROUP BY CourseDate
    --ORDER BY CourseDate
),

Q2 AS
(
    SELECT
        x.CourseDate,
        AVG(CAST(Grade AS Numeric)) AS LowerQuartile

    FROM
    (
        SELECT
            Estats.CourseDate,
            Estats.Grade,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(EStats.CourseDate, 4)
                ORDER BY Grade ASC) AS RowAsc,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(Estats.CourseDate, 4)
                ORDER BY Grade DESC) AS RowDesc
        FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
        WHERE EStats.Grade < Q3.Median 
    )x
    WHERE
        RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
    GROUP BY x.CourseDate
),

Q4 AS
(
    SELECT
        x.CourseDate,
        AVG(CAST(Grade AS Numeric)) AS UpperQuartile

    FROM
    (
        SELECT
            Estats.CourseDate,
            Estats.Grade,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(EStats.CourseDate, 4)
                ORDER BY Grade ASC) AS RowAsc,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(Estats.CourseDate, 4)
                ORDER BY Grade DESC) AS RowDesc
        FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
        WHERE EStats.Grade > Q3.Median 
    )x
    WHERE
        RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
    GROUP BY x.CourseDate
)

SELECT Q3.CourseDate, Q3.Median AS Median, Q2.LowerQuartile, Q4.UpperQuartile, MIN(EStats.Grade) AS Min, MAX(EStats.Grade) AS Max
FROM Q3
    JOIN Q2 ON Q3.CourseDate = Q2.CourseDate
    JOIN Q4 ON Q3.CourseDate = Q4.CourseDate
    JOIN EStats ON Q3.CourseDate = EStats.CourseDate
GROUP BY Q3.CourseDate, Q3.Median, Q2.LowerQuartile, Q4.UpperQuartile
ORDER BY Q3.CourseDate

1 个答案:

答案 0 :(得分:0)

尝试这个来获得中位数:

select avg(case when seqnum*2 = totnum+1 then col
                when seqnum*2 in (totnum, totnum + 2) then col
            end)
from (select t.*, row_number() over (order by col) as seqnum,
             count(*) over () as totnum
      from t
     ) t

看起来很神秘,但我们的想法是为偶数做平均值,为其他数做单值。如果使用SQL Server,请记住它使用整数除法。您实际上可以将上述内容简化为:

select avg(case when seqnum*2 in (totnum, totnum+1, totnum+2) then col end)

这是有效的,因为奇数总cnt只与totnum+1匹配,偶数匹配其他两个值。