使用PostgreSQL 9.4.18版
以下是一个查询,该查询返回non_zero_year_count和percent_years_count_not_zero的意外结果:
表格数据:
从1988-2018年开始,但在sqlfiddle中,测试数据库刚刚完成了2016-2018年的下表。 FallbackPolicy
CREATE TABLE ltg_data
("intensity" int, "time" timestamp with time zone, "lon" int, "lat" int)
(200, '2018-06-23 07:19:00', -122.109, 42.9446),
(200, '2018-06-24 07:19:00', -122.109, 42.9446),
(200, '2018-06-25 07:19:00', -122.109, 42.9446),
(200, '2018-06-26 07:19:00', -122.109, 42.9446),
(200, '2018-06-26 07:19:00', -122.109, 42.9446),
(200, '2018-06-24 07:19:00', -122.109, 42.9446),
(200, '2018-06-25 07:19:00', -122.109, 42.9446),
(200, '2018-06-26 07:19:00', -122.109, 42.9446),
(200, '2018-06-26 07:19:00', -122.109, 42.9446),
(200, '2018-06-24 07:19:00', -122.109, 42.9446),
(200, '2018-06-25 07:19:00', -122.109, 42.9446),
(200, '2018-06-26 07:19:00', -122.109, 42.9446),
(200, '2018-06-26 07:19:00', -122.109, 42.9446),
(200, '2018-06-24 07:19:00', -122.109, 42.9446),
(200, '2018-06-25 07:19:00', -122.109, 42.9446),
(200, '2018-06-26 07:19:00', -122.109, 42.9446),
(200, '2018-06-25 17:19:00', -122.109, 42.9446),
(200, '2018-06-25 17:19:00', -122.109, 42.9446),
(200, '2017-06-25 19:19:00', -122.109, 42.9446),
(200, '2017-06-25 20:19:00', -122.109, 42.9446),
(200, '2017-06-26 07:19:00', -122.109, 42.9446),
(200, '2017-06-26 07:19:00', -122.109, 42.9446),
(200, '2017-06-24 07:19:00', -122.109, 42.9446),
(200, '2017-06-24 07:19:00', -122.109, 42.9446),
(200, '2017-06-23 21:19:00', -122.109, 42.9446),
(200, '2017-06-23 21:19:00', -122.109, 42.9446),
(200, '2017-06-24 07:19:00', -122.109, 42.9446),
(200, '2017-06-24 07:19:00', -122.109, 42.9446),
(200, '2017-06-26 07:19:00', -122.109, 42.9446),
(200, '2017-06-26 07:19:00', -122.109, 42.9446),
(200, '2016-06-26 07:19:00', -122.109, 42.9446),
(200, '2016-06-25 07:19:00', -122.109, 42.9446),
(200, '2016-06-25 07:19:00', -122.109, 42.9446),
(200, '2016-06-27 07:19:00', -122.109, 42.9446),
(200, '2016-06-26 07:19:00', -122.109, 42.9446),
(200, '2016-06-26 07:19:00', -122.109, 42.9446)
因此,以下查询应返回一些有关表数据的基本统计信息。我认为,挑战在于尝试以一年中的几个小时和小时为单位进行划分,同时以某种方式合并年份。错误的数据涉及查询的一部分,该部分试图确定某年的某周和某小时(每小时)的计数> 0的年数。这是查询所使用的查询和功能(将标准化年份逐年纳入leap年的虚函数)。我正在使用“生成系列”,因为我希望获得一整年的价值,即使某个价值没有任何计数。
功能:
create or replace function IsLeapYear(int)
returns boolean as $$
select $1 % 4 = 0 and ($1 % 100 <> 0 or $1 % 400 = 0)
$$ LANGUAGE sql IMMUTABLE STRICT;
create or replace function f_woyhh(timestamp with time zone)
returns int language plpgsql as $$
declare
currentYear int = extract (year from $1);
LeapYearShift int = 1 + (IsLeapYear(currentYear) and $1 > make_date (currentYear, 2, 28))::int;
begin
return CONCAT(((extract(doy from $1)::int)- LeapYearShift) / 7+ 1, to_char ($1, 'HH24'));
end;
$$;
查询:
WITH
CTE_Dates
AS
(
SELECT f_woyhh(d) as dt
,EXTRACT(YEAR FROM d::timestamp) AS dtYear from
generate_series(timestamp '2016-01-01', timestamp '2018-12-31', interval '1 hour') as d
-- full range of possible dates
)
,CTE_WeeklyHourlyCounts
AS
(
SELECT
f_woyhh(time) as dt
,time
,count(*) AS ct
FROM
ltg_data
GROUP BY ltg_data.time
)
,CTE_FullStats
AS
(
SELECT
CTE_dates.dt as woyhh
,COUNT(DISTINCT CTE_Dates.dtYear) AS years_count
,SUM(CASE WHEN CTE_WeeklyHourlyCounts.ct > 0 THEN 1 ELSE 0 END) OVER (PARTITION BY CTE_Dates.dt) AS nonzero_year_count
,100.0 * SUM(CASE WHEN CTE_WeeklyHourlyCounts.ct > 0 THEN 1 ELSE 0 END) OVER (PARTITION BY CTE_Dates.dt)
/ COUNT(DISTINCT CTE_Dates.dtYear) as percent_years_count_not_zero
FROM
CTE_Dates
LEFT JOIN CTE_WeeklyHourlyCounts ON CTE_WeeklyHourlyCounts.dt = CTE_Dates.dt
GROUP BY CTE_dates.dt, CTE_WeeklyHourlyCounts.ct, CTE_WeeklyHourlyCounts.dt
)
SELECT
woyhh
,nonzero_year_count
,years_count
,percent_years_count_not_zero
FROM
CTE_FullStats
WHERE woyhh::text like '26%'
GROUP BY woyhh, years_count, nonzero_year_count, percent_years_count_not_zero
ORDER BY woyhh
意外结果:
woyhh | nonzero_year_count | years_count| percent_years_count_not_zero
2605 | 0 | 3 | 0
2606 | 0 | 3 | 0
2607 | 5 | 3 | 200
2608 | 0 | 3 | 0
2609 | 0 | 3 | 0
不适用于2607的结果部分为nonzero_year_count,应为3,因为只有3年的数据,并且每个年份的第26周和第07小时都有计数(任何一天)该月24日之后的第26周)。另外,percent_years_count_not_zero应该是100%,而不是200%。 100%是最大期望的percent_years_count_not_zero。
所需结果:
woyhh | nonzero_year_count | years_count| percent_years_count_not_zero
2605 | 0 | 3 | 0
2606 | 0 | 3 | 0
2607 | 3 | 3 | 100
2608 | 0 | 3 | 0
2609 | 0 | 3 | 0
所以我认为主要问题在于查询的这一部分:
,SUM(CASE WHEN CTE_WeeklyHourlyCounts.ct > 0 THEN 1 ELSE 0 END) OVER (PARTITION BY CTE_Dates.dt) AS nonzero_year_count
如果我要分区,但这还不够,因为我需要考虑年份。就像我需要以某种方式对年份进行分组,以确定一年中是否发生过一次问题,然后将其视为该年份中的一年而已。我尝试合并年份,但遇到了更奇怪的结果。
我希望这可以澄清我的问题。我在下面添加了一个更新的sqlfiddle,以复制用于测试表的数据/查询。感谢您的帮助!