PostgreSQL中的percentile_cont在多年内每mmdd计数(*)的奇怪结果

时间:2018-06-14 06:04:35

标签: sql postgresql

我相信我已经接近从表中获得良好的百分位数结果,但并不完全存在。我已经使用了Stackoverflow的一些建议,并且分区正在帮助。

使用PostgreSQL版本9.4.18,PostGIS版本2.2

这是一个导致每年每mmdd计数(*)的查询。使用函数f_mmdd:

CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE AS
$$SELECT to_char($1, 'MMDD')::int$$; 

SELECT ct.ct, mmdd, yyyy, percentile_cont(ARRAY[0.1, 0.5, 0.9]) WITHIN GROUP 
(ORDER BY ct.ct ASC)
FROM (
SELECT f_mmdd(d::date) AS mmdd
FROM generate_series(timestamp '2018-01-01', timestamp '2018-12-31', interval '1 day') d
) d
LEFT JOIN (
SELECT f_mmdd((time at time zone 'UTC' at time zone 'america/los_angeles')::date) as mmdd, 
to_char(time at time zone 'UTC' at time zone 'america/los_angeles', 'YYYY') as yyyy, count(*) 
OVER (partition by to_char(time at time zone 'UTC' at time zone 'america/los_angeles', 
'YYYY-MM-DD')) as ct
FROM fwz c
JOIN ltg_data d on ST_contains(c.the_geom, d.ltg_geom)
WHERE zone = '623'
GROUP BY 1, d.time
) ct USING (mmdd)
GROUP BY mmdd, ct.ct, ct.yyyy
ORDER BY mmdd ASC;

结果

此查询基本上显示每年mmdd的count(*)。许多年都是空的。这是两个mmdds的例子。我理解为什么percentile_cont是相同的数字。它试图获得一个数字的百分位数,而不是多年来的一个群体。

ct  | mmdd | yyyy | percentile_cont 

6 |  726 | 2003 | {6,6,6}
7 |  726 | 2013 | {7,7,7}
8 |  726 | 2010 | {8,8,8}
10 |  726 | 1998 | {10,10,10}
12 |  726 | 1988 | {12,12,12}
28 |  726 | 1996 | {28,28,28}
35 |  726 | 2004 | {35,35,35}
41 |  726 | 1995 | {41,41,41}
90 |  726 | 2017 | {90,90,90}


1 |  807 | 1989 | {1,1,1}
3 |  807 | 1993 | {3,3,3}
7 |  807 | 1999 | {7,7,7}
16 |  807 | 2008 | {16,16,16}
17 |  807 | 2009 | {17,17,17}
22 |  807 | 2017 | {22,22,22}
151 |  807 | 2003 | {151,151,151}
157 |  807 | 2013 | {157,157,157}
400 |  807 | 2006 | {400,400,400}

现在对于接近答案的查询,我认为因为我只在外部查询中按mmdd分组:

SELECT mmdd, percentile_cont(ARRAY[0.1, 0.5, 0.9]) WITHIN GROUP (ORDER BY ct ASC)
FROM (
SELECT f_mmdd(d::date) AS mmdd
FROM generate_series(timestamp '2018-01-01', timestamp '2018-12-31', interval '1 day') d
) d
LEFT JOIN (
SELECT f_mmdd((time at time zone 'UTC' at time zone 'america/los_angeles')::date) as mmdd, 
count(*) OVER (partition by to_char(time at time zone 'UTC' at time zone 
'america/los_angeles', 'YYYY-MM-DD')) as ct
FROM fwz c
JOIN ltg_data d on ST_contains(c.the_geom, d.ltg_geom)
WHERE zone = '623'
GROUP BY 1, d.time
) ct USING (mmdd)
GROUP BY mmdd

结果越来越接近我的预期,但有两个问题。在数字组之间似乎没有任何插值,为什么807" 400"的0.5百分位数。当那是最大值?

结果

mmdd |      percentile_cont 
726 | {10,41,90}
807 | {151,400,400}

你知道这里会发生什么吗?感谢您的帮助,我可以尝试提供任何可能有用的其他信息。我没有包含所有数据库信息/表等,因为我不确定这是否会有所帮助,这会使问题更长。

0 个答案:

没有答案