我正在寻找一种方法来获取一系列开始和结束日期的MEDIAN(很多和很多日期)。但是,它将特定于各种发票号码。"请参阅下面的示例数据。
invoice_no invoice start date invoice end date
4006 11/14/2001 12/15/2004
20071 11/29/2001 02/01/2003
19893 11/30/2001 12/02/2001
19894 11/30/2001 12/04/2001
004 10/22/2002 10/31/2002
查找开始日期和结束日期之间的中位数。
中位数只是特定invoice_no的开始日期和结束日期之间的任何值。但是,尽可能地过滤数据。我意识到我也可以做WHERE STATUS<> ' REJECTED'它还应该有助于保持很多不确定的日期。另外,我只想在几个月之间过滤,所以我也加入了BETWEEN DATETIME。
到目前为止代码:
WITH tmp AS ( SELECT invoice_no, invoice_start_date, invoice_end_date, check_date, status_code, cast(count(*) OVER (PARTITION BY invoice_no) as float) AS total, row_number() OVER (PARTITION BY invoice_no ORDER BY invoice_start_date, invoice_end_date, check_date) AS rn FROM INVOICE_HEADER INNER JOIN INVOICE_HEADER_CUSTOM ON INVOICE_HEADER.invoice_id = INVOICE_HEADER_CUSTOM.invoice_idWHERE status_code <> 'REJECTED' AND
Check_Date BETWEEN CONVERT(DATETIME, '2014-12-01 00:00:00', 102) AND CONVERT(DATETIME, '2014-12-31 00:00:00', 102) )
SELECT * FROM tmp WHERE (total / 2.0 - 1) < rn and rn < (total / 2.0 + 1)
答案 0 :(得分:1)
你很接近,只错过了PARTITION BY
声明中的count
条款。
WITH
tmp AS
(
SELECT invoice_no,
dates,
cast(count(*) OVER (PARTITION BY invoice_no) as float) AS total,
row_number() OVER (PARTITION BY invoice_no ORDER BY dates) AS rn
FROM INVOICE_HEADER
)
SELECT *
FROM tmp
WHERE (total / 2.0 - 1) < rn
and rn < (total / 2.0 + 1)
答案 1 :(得分:1)
我会将中位数表示为:
SELECT invoice_no,
(MIN(date) +
(DATEDIFF(hour, MIN(DATE), MAX(DATE)) / 2.0)
)
FROM (SELECT ih.invoice_no
COUNT(*) OVER (PARTITION BY invoice_no) as cnt,
ROW_NUMBER() OVER (PARTITION BY invoice_no ORDER BY dates) as seqnum
FROM INVOICE_HEADER ih
) ih
WHERE 2*seqnum in (cnt, cnt + 1, cnt + 2)
GROUP BY invoice_no
大部分内容可能都是不言自明的。您需要partition by
对每个发票号进行计算。您需要order by dates
以正确的顺序放置值。 where
子句是我处理中位数奇数/偶数问题的首选方法。
最后一步有点棘手。当存在偶数个值时,中位数是日期时间的平均值。这可能有点难以计算。而是采用最小日期并添加最大值和最小值之间的差值。当存在偶数个元素时,这会产生平均值。当存在奇数个元素时,这也会产生平均值,因为最小值和最大值是相同的值。