首先,我在DB2 for i5 / OS V5R4上运行。我有ROW_NUMBER(),RANK()和公用表表达式。我不有TOP n PERCENT或LIMIT OFFSET。
我正在使用的实际数据集很难解释,所以我只想说我有一个列为(city, temperature, timestamp)
的天气历史表。我想将中位数与每组(city)
的平均值进行比较。
这是我发现获得整个表聚合的中位数的最简洁方法。我从IBM红皮书here中对其进行了改编:
WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t
这样可以很好地恢复单行,但它似乎因分组而分崩离析。从概念上讲,这就是我想要的:
SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...
city | mean_temp | median_temp
===================================================
'Minneapolis' | 60 | 64
'Milwaukee' | 65 | 66
'Muskegon' | 70 | 61
可能有一个让我看起来很愚蠢的答案,但是我有一个心理障碍,这不是我现在工作的第一件事。似乎有可能,但我不能使用非常复杂的东西,因为它是一个大表,我希望能够自定义聚合哪些列。
答案 0 :(得分:1)
在SQL Server中,agreagate函数(如count(*))可以在没有group by的情况下进行分区和计算。我快速浏览了引用的红皮书,看起来DB2具有相同的功能。但如果没有,那么这将不起作用:
create table TemperatureHistory
(City varchar(20)
, Temperature decimal(5, 2)
, DateTaken datetime)
insert into TemperatureHistory values ('Minneapolis', 61, '20090101')
insert into TemperatureHistory values ('Minneapolis', 59, '20090102')
insert into TemperatureHistory values ('Milwaukee', 65, '20090101')
insert into TemperatureHistory values ('Milwaukee', 65, '20090102')
insert into TemperatureHistory values ('Milwaukee', 100, '20090103')
insert into TemperatureHistory values ('Muskegon', 80, '20090101')
insert into TemperatureHistory values ('Muskegon', 70, '20090102')
insert into TemperatureHistory values ('Muskegon', 70, '20090103')
insert into TemperatureHistory values ('Muskegon', 20, '20090104')
; with base_t as
(select city
, Temperature
, row_number() over (partition by city order by temperature) as RowNum
, (count(*) over (partition by city)) + 1 as CountPlusOne
from TemperatureHistory)
select City
, avg(Temperature) as MeanTemp
, avg(case
when RowNum in (FLOOR(CountPlusOne/2.0), CEILING(CountPlusOne/2.0))
then Temperature
else null end) as MedianTemp
from base_t
group by City