我有许多带有风力预报数据的天气位置。我需要在前一天as_of
之前的最新10:00
。我每个小时,每个小时都需要这个。
位置定义为唯一的lat
和lon
对。
包含相关样本数据的完整表架构:
CREATE SCHEMA weather
CREATE TABLE weather.forecast
(
foretime timestamp without time zone NOT NULL,
as_of timestamp without time zone NOT NULL, -- in UTC
summary text,
precipintensity numeric(8,4),
precipprob numeric(2,2),
temperature numeric(5,2),
apptemp numeric(5,2),
dewpoint numeric(5,2),
humidity numeric(2,2),
windspeed numeric(5,2),
windbearing numeric(4,1),
visibility numeric(5,2),
cloudcover numeric(4,2),
pressure numeric(6,2),
ozone numeric(5,2),
preciptype text,
lat numeric(8,6) NOT NULL,
lon numeric(9,6) NOT NULL,
CONSTRAINT forecast_pkey PRIMARY KEY (foretime, as_of, lat, lon)
);
INSERT INTO weather.forecast
(windspeed, foretime, as_of, lat, lon)
VALUES
(11.19, '2/1/2016 8:00', '1/30/2016 23:00', 34.556, 28.345),
(10.98, '2/1/2016 8:00', '1/31/2016 5:00', 34.556, 28.345),
(10.64, '2/1/2016 8:00', '1/31/2016 11:00', 34.556, 28.345),
(10.95, '2/1/2016 8:00', '1/31/2016 8:00', 29.114, 16.277),
(10.39, '2/1/2016 8:00', '1/31/2016 23:00', 29.114, 16.277),
(9.22, '2/1/2016 8:00', '1/31/2016 5:00', 29.114, 16.277),
(10, '2/1/2016 9:00', '1/30/2016 04:00', 34.556, 28.345),
(9.88, '2/1/2016 9:00', '1/31/2016 09:00', 34.556, 28.345),
(10.79, '2/1/2016 9:00', '1/30/2016 23:00', 34.556, 28.345),
(10.8, '2/1/2016 9:00', '1/31/2016 5:00', 29.114, 16.277),
(10.35, '2/1/2016 9:00', '1/31/2016 11:00', 29.114, 16.277),
(10.07, '2/1/2016 9:00', '1/31/2016 17:00', 29.114, 16.277)
;
所需的结果格式:
lat lon Foredate foreHE windspeed as_of
34.556 28.345 2/1/2016 8 10.98 1/31/2016 5:00
34.556 28.345 2/1/2016 9 9.88 1/31/2016 9:00
29.114 16.277 2/1/2016 8 10.95 1/31/2016 8:00
29.114 16.277 2/1/2016 9 10.80 1/31/2016 5:00
以下是获取正确as_of
的代码。当我试图加入风速时,事情对我来说很糟糕。
SELECT
date_trunc('day', (a.foretime)) :: DATE AS Foredate,
extract(HOUR FROM (a.foretime)) AS foreHE,
a.lat,
a.lon,
max(a.as_of) - interval '5 hours' as latest_as_of
FROM weather.forecast a
WHERE date_trunc('day', foretime) :: DATE - as_of >= INTERVAL '14 hours'
GROUP BY Foredate, foreHE, a.lat, a.lon
答案 0 :(得分:2)
Your error, when adding back windspeed, is this:
[42803] ERROR: column "a.windspeed" must appear in the GROUP BY clause or be used in an aggregate function
Position: 184
I can't really improve on PostgreSQL's error message, except maybe to get into the theory a bit. Basically, when you do GROUP BY
you're affording yourself the luxury acting over subsets within the larger set which is the table represented by the rest of the query. But SQL doesn't let you have those subsets to iterate over, you must tell the database your computations and let it give you back another flat list.
Of the two options presented by Postgres, usually one of them is an obvious choice. For instance, if you left out a.lon
, it would be obvious that you're not grouping by longitude, just latitude, and you would agree that it should be added to the GROUP BY
clause. But in this case if you group by actual measurements, every subset will have just one row, and that isn't useful either. So at first blush it seems like you need an aggregate. Problem number two is that there isn't an aggregate for you for this problem. Sigh!
So here's my thinking. The primary key you need to look up by is (forename, as_of, lat, lon) and you can get that straightforwardly with this query:
select
foretime,
max(as_of) as as_of,
lat, lon
from weather.forecast
group by foretime, lat, lon;
Now you could join this back onto the same table, forecast
, to get the most recent forecasts:
select
date_trunc('day', a.foretime)::date as forecast_day,
extract(hour from a.foretime) as forecast_hour,
a.lat, a.lon,
f.windspeed,
a.as_of - interval '5 hours' as latest_as_of
from weather.forecast f
join (select
foretime,
max(as_of) as as_of,
lat, lon
from weather.forecast
group by foretime, lat, lon) a using (foretime, as_of, lat, lon);
This produces the following report:
forecast_day | forecast_hour | lat | lon | windspeed | latest_as_of
--------------+---------------+-----------+-----------+-----------+---------------------
2016-02-01 | 8 | 34.556000 | 28.345000 | 10.64 | 2016-01-31 06:00:00
2016-02-01 | 8 | 29.114000 | 16.277000 | 10.39 | 2016-01-31 18:00:00
2016-02-01 | 9 | 34.556000 | 28.345000 | 9.88 | 2016-01-31 04:00:00
2016-02-01 | 9 | 29.114000 | 16.277000 | 10.07 | 2016-01-31 12:00:00
(4 rows)
There is probably a more efficient way to do this with correlated subqueries but I'm not sure how to accomplish it.
Edit: Matching your output format:
select
a.lat, a.lon,
date_trunc('day', a.foretime)::date as forecast_day,
extract(hour from a.foretime) as forecast_hour,
f.windspeed,
a.as_of - interval '5 hours' as latest_as_of
from weather.forecast f
join (select
foretime,
max(as_of) as as_of,
lat, lon
from weather.forecast
where date_trunc('day', foretime)::date - as_of >= interval '14 hours'
group by foretime, lat, lon) a using (foretime, as_of, lat, lon)
order by lat desc, lon;
Result:
lat | lon | forecast_day | forecast_hour | windspeed | latest_as_of
-----------+-----------+--------------+---------------+-----------+---------------------
34.556000 | 28.345000 | 2016-02-01 | 8 | 10.98 | 2016-01-31 00:00:00
34.556000 | 28.345000 | 2016-02-01 | 9 | 9.88 | 2016-01-31 04:00:00
29.114000 | 16.277000 | 2016-02-01 | 8 | 10.95 | 2016-01-31 03:00:00
29.114000 | 16.277000 | 2016-02-01 | 9 | 10.80 | 2016-01-31 00:00:00
(4 rows)