PostgreSQL查询:在截止日期前获得最新预测,与实际情况进行比较

时间:2016-11-19 00:36:55

标签: postgresql greatest-n-per-group

我想查看历史实际和预测的风,按小时和天分解。

我对一天中的某个小时有多个预测。我有一个交易截止日期为美国东部时间上午10点的第二天的交易,所以我希望在此之前的最新预测与该小时的实际风力相同。

使问题复杂化的是时间戳是格林尼治标准时间,比美国东部时间早5小时。

<?php
/* @var $exception \yii\base\Exception */
/* @var $handler \app\components\saidbakr\FoxErrorHandler */
?>
<div class="previous">
    <span class="arrow">&crarr;</span>
    <h2>
        <span>Caused by:</span>
        <?php $name = $handler->getExceptionName($exception);
            if ($name !== null): ?>
            <span><?= $handler->htmlEncode($name) ?></span> &ndash;
            <?= $handler->addTypeLinks(get_class($exception)) ?>
        <?php else: ?>
            <span><?= $handler->htmlEncode(get_class($exception)) ?></span>
        <?php endif; ?>
    </h2>
    <h3><?= nl2br($handler->htmlEncode($exception->getMessage())) ?></h3>
    <!-- in the next line we are going to remove the full filesystem path and replace it with triple dots -->
    <p>in <span class="file"><?= str_replace(Yii::$app->basePath, '...',$exception->getFile()) ?></span> at line <span class="line"><?= $exception->getLine() ?></span></p>
<!-- End Edit -->
    <?php if ($exception instanceof \yii\db\Exception && !empty($exception->errorInfo)) {
        echo '<pre>Error Info: ' . print_r($exception->errorInfo, true) . '</pre>';
    } ?>
    <?= $handler->renderPreviousExceptions($exception) ?>
</div>

以下是包含相关样本数据的完整表格结构。

   WITH
   forecast_prep AS (
       SELECT
             date_trunc('day', (foretime - interval '5 hours')) :: DATE AS Foredate,
             extract(HOUR FROM (foretime - interval '5 hours')) + 1     AS foreHE,
             lat,
             lon,
             max(windspeed) as forecast,
             max(as_of) - interval '5 hours'      AS as_of
       FROM weather.forecast
       WHERE date_trunc('day', foretime) :: DATE - as_of >= INTERVAL '9 hours'
       GROUP BY Foredate, foreHE, lat, lon
  ),
  tmp AS (
     SELECT
       meso.station,
       meso.lat,
       meso.lon,
       (meso.timestmp - interval '5 hours') as timestmp,
       date_trunc('day', (meso.timestmp - interval '5 hours')) :: DATE  AS Date,
       extract(HOUR FROM (meso.timestmp - interval '5 hours')) + 1      AS HE,
       CAST(AVG(meso.windspd) AS NUMERIC(19, 2)) AS Actual
     FROM weather.meso
     GROUP BY station, lat, lon, timestmp, Date, HE
  )
SELECT 
   tmp.station, tmp.Date, tmp.HE, tmp.Actual, forecast_prep.forecast, forecast_prep.as_of
FROM tmp
INNER JOIN forecast_prep ON (
   tmp.lat = forecast_prep.lat 
   AND tmp.lon = forecast_prep.lon 
   AND tmp.Date = forecast_prep.Foredate
   AND tmp.HE = forecast_prep.foreHE
)
WHERE 
   (tmp.timestmp BETWEEN '2016-02-01' AND '2016-02-02') 
   AND (tmp.station = 'KSBN')
GROUP BY 
   tmp.station, tmp.Date, tmp.HE, forecast_prep.forecast, forecast_prep.as_of, tmp.Actual
ORDER BY tmp.Date, tmp.HE ASC;

这是我想要的输出格式:

CREATE SCHEMA weather
CREATE TABLE weather.forecast
    (
    foretime timestamp without time zone NOT NULL,
  as_of timestamp without time zone NOT NULL, -- in UTC
  summary text,
  precipintensity numeric(8,4),
  precipprob numeric(2,2),
  temperature numeric(5,2),
  apptemp numeric(5,2),
  dewpoint numeric(5,2),
  humidity numeric(2,2),
  windspeed numeric(5,2),
  windbearing numeric(4,1),
  visibility numeric(5,2),
  cloudcover numeric(4,2),
  pressure numeric(6,2),
  ozone numeric(5,2),
  preciptype text,
  lat numeric(8,6) NOT NULL,
  lon numeric(9,6) NOT NULL,
  CONSTRAINT forecast_pkey PRIMARY KEY (foretime, as_of, lat, lon)
  );

INSERT INTO weather.forecast
    (windspeed, foretime, as_of, lat, lon)
VALUES
  (11.19,   '2/1/2016 8:00', '1/30/2016 23:00', 34.556, 28.345),
  (10.98,   '2/1/2016 8:00',    '1/31/2016 5:00', 34.556, 28.345),
  (10.64,   '2/1/2016 8:00',    '1/31/2016 11:00', 34.556, 28.345),
  (10.95,   '2/1/2016 8:00',    '1/31/2016 17:00', 34.556, 28.345),
  (10.39,   '2/1/2016 8:00',    '1/31/2016 23:00', 34.556, 28.345),
  (9.22,    '2/1/2016 8:00',    '2/1/2016 5:00', 34.556, 28.345),
  (10,  '2/1/2016 9:00',    '1/30/2016 11:00', 34.556, 28.345),
  (9.88,    '2/1/2016 9:00',    '1/30/2016 17:00', 34.556, 28.345),
  (10.79,   '2/1/2016 9:00',    '1/30/2016 23:00', 34.556, 28.345),
  (10.8,    '2/1/2016 9:00',    '1/31/2016 5:00', 34.556, 28.345),
  (10.35,   '2/1/2016 9:00',    '1/31/2016 11:00', 34.556, 28.345),
  (10.07,   '2/1/2016 9:00',    '1/31/2016 17:00', 34.556, 28.345),
  (9.57,    '2/1/2016 9:00',    '1/31/2016 23:00', 34.556, 28.345),
  (7.93,    '2/1/2016 9:00',    '2/1/2016 5:00', 34.556, 28.345)
;

CREATE TABLE weather.meso
(
  timestmp timestamp without time zone NOT NULL,
  station text NOT NULL,
  lat numeric NOT NULL,
  lon numeric NOT NULL,
  tmp numeric,
  hum numeric,
  windspd numeric,
  winddir integer,
  dew numeric,
  CONSTRAINT meso_pkey PRIMARY KEY (timestmp, station, lat, lon)
);
INSERT INTO weather.meso
    (station, timestmp, lat, lon, windspd)
VALUES
  ('KSBN',  '2/1/2016 8:02', 34.556, 28.345, 16.1),
  ('KSBN',  '2/1/2016 8:12', 34.556, 28.345, 12.6),
  ('KSBN',  '2/1/2016 8:54', 34.556, 28.345, 11.5),
  ('KSBN',  '2/1/2016 9:02', 34.556, 28.345, 18.1),
  ('KSBN',  '2/1/2016 9:17', 34.556, 28.345, 12.2),
  ('KSBN',  '2/1/2016 9:48', 34.556, 28.345, 11.5)
;

1 个答案:

答案 0 :(得分:0)

DDL和样本数据确实有助于理解,但我可以提出的更多细节是如何利用row_number,例如,这也可以在线获取http://rextester.com/FIEUPI83002

select
  row_number() OVER(PARTITION BY date_trunc('day', (foretime - interval '5 hours')) :: DATE 
                    ORDER BY case when extract(HOUR FROM (foretime - interval '5 hours')) < 10 then 1 else 2 end, AS_OF desc) AS rn
, extract(HOUR FROM (foretime - interval '5 hours')) HR
, foretime
, as_of
from forecast
order by RN, as_of DESC

结果,从可用的样本数据如下:

+----+----+-----------+---------------------+---------------------+
|    | rn | date_part |      foretime       |        as_of        |
+----+----+-----------+---------------------+---------------------+
|  1 |  1 |         4 | 01.02.2016 09:00:00 | 01.02.2016 05:00:00 |
|  2 |  2 |         3 | 01.02.2016 08:00:00 | 01.02.2016 05:00:00 |
|  3 |  3 |         4 | 01.02.2016 09:00:00 | 31.01.2016 23:00:00 |
|  4 |  4 |         3 | 01.02.2016 08:00:00 | 31.01.2016 23:00:00 |
|  5 |  5 |         4 | 01.02.2016 09:00:00 | 31.01.2016 17:00:00 |
|  6 |  6 |         3 | 01.02.2016 08:00:00 | 31.01.2016 17:00:00 |
|  7 |  7 |         4 | 01.02.2016 09:00:00 | 31.01.2016 11:00:00 |
|  8 |  8 |         3 | 01.02.2016 08:00:00 | 31.01.2016 11:00:00 |
|  9 |  9 |         3 | 01.02.2016 08:00:00 | 31.01.2016 05:00:00 |
| 10 | 10 |         4 | 01.02.2016 09:00:00 | 31.01.2016 05:00:00 |
| 11 | 11 |         3 | 01.02.2016 08:00:00 | 30.01.2016 23:00:00 |
| 12 | 12 |         4 | 01.02.2016 09:00:00 | 30.01.2016 23:00:00 |
| 13 | 13 |         4 | 01.02.2016 09:00:00 | 30.01.2016 17:00:00 |
| 14 | 14 |         4 | 01.02.2016 09:00:00 | 30.01.2016 11:00:00 |
+----+----+-----------+---------------------+---------------------+

因此,如果您要使用过滤器 WHERE RN = 1 &#34;最近的&#34;应列出每天10行之前的行。我相信这样的事情将适合您的要求。请注意,使用case表达式和排序row_number序列的其他列(在OVER()子句内)调整列的组合以满足您的需要。

以下原始评论

在没有样本数据的情况下,我将讨论一种方法;我建议使用 ROW_NUMBER()OVER(按date_time_column DESC排序) 例如

select
*
from (
  select *
    , ROW_NUMBER() OVER(ORDER BY timestmp DESC) AS RN
  from forecast_table
  -- where timestmp < 10 am (include required logic ere)
  )
WHERE RN = 1

由于DESCendng顺序,计算列RN中值为1的行将是最新的行。这可以与 PARTITION BY 结合使用,因此row_numebr方法可用于查找&#34;最近的&#34;行或&#34;最老的&#34;行或甚至每个部分或整体的最大/最小行数。