在SQL Server 2008中查找日期中位数

时间:2014-12-12 17:20:25

标签: sql sql-server-2008 date median

我正在寻找一种方法来获取一系列开始和结束日期的MEDIAN(很多和很多日期)。但是,它将特定于各种发票号码。"请参阅下面的示例数据。

 invoice_no   invoice start date      invoice end date
 4006            11/14/2001               12/15/2004
 20071  11/29/2001               02/01/2003
 19893  11/30/2001               12/02/2001
 19894  11/30/2001               12/04/2001
 004             10/22/2002               10/31/2002

查找开始日期和结束日期之间的中位数。

中位数只是特定invoice_no的开始日期和结束日期之间的任何值。但是,尽可能地过滤数据。我意识到我也可以做WHERE STATUS<> ' REJECTED'它还应该有助于保持很多不确定的日期。另外,我只想在几个月之间过滤,所以我也加入了BETWEEN DATETIME。

到目前为止

代码:        

  WITH
    tmp AS
    (
        SELECT invoice_no,
                invoice_start_date, invoice_end_date, check_date, status_code,
                cast(count(*) OVER (PARTITION BY invoice_no) as float) AS total,
                row_number() OVER (PARTITION BY invoice_no ORDER BY invoice_start_date, invoice_end_date, check_date) AS rn
        FROM    INVOICE_HEADER INNER JOIN
                      INVOICE_HEADER_CUSTOM ON INVOICE_HEADER.invoice_id = INVOICE_HEADER_CUSTOM.invoice_id

                  WHERE status_code <> 'REJECTED' AND 

Check_Date BETWEEN CONVERT(DATETIME, '2014-12-01 00:00:00', 102) AND CONVERT(DATETIME, '2014-12-31 00:00:00', 102) )

SELECT * FROM tmp WHERE (total / 2.0 - 1) < rn and rn < (total / 2.0 + 1)

2 个答案:

答案 0 :(得分:1)

你很接近,只错过了PARTITION BY声明中的count条款。

WITH
    tmp AS
    (
        SELECT invoice_no,
                dates,
                cast(count(*) OVER (PARTITION BY invoice_no) as float) AS total,
                row_number() OVER (PARTITION BY invoice_no ORDER BY dates) AS rn
        FROM    INVOICE_HEADER
    )


SELECT *
FROM tmp
WHERE (total / 2.0 - 1) < rn
    and rn < (total / 2.0 + 1)

答案 1 :(得分:1)

我会将中位数表示为:

SELECT invoice_no,
       (MIN(date) +
        (DATEDIFF(hour, MIN(DATE), MAX(DATE)) / 2.0)
       )
FROM (SELECT ih.invoice_no
             COUNT(*) OVER (PARTITION BY invoice_no) as cnt,
             ROW_NUMBER() OVER (PARTITION BY invoice_no ORDER BY dates) as seqnum
      FROM INVOICE_HEADER ih
     ) ih
WHERE 2*seqnum in (cnt, cnt + 1, cnt + 2)
GROUP BY invoice_no

大部分内容可能都是不言自明的。您需要partition by对每个发票号进行计算。您需要order by dates以正确的顺序放置值。 where子句是我处理中位数奇数/偶数问题的首选方法。

最后一步有点棘手。当存在偶数个值时,中位数是日期时间的平均值。这可能有点难以计算。而是采用最小日期并添加最大值和最小值之间的差值。当存在偶数个元素时,这会产生平均值。当存在奇数个元素时,这也会产生平均值,因为最小值和最大值是相同的值。