如何在最近的日期加入PostgreSQL

时间:2019-04-29 19:54:54

标签: postgresql date join

假设我有以下表格

product_prices

product|price|date
-------+-----+----------
apple  |10   |2014-03-01
-------+-----+----------
apple  |20   |2014-05-02
-------+-----+----------
egg    |2    |2014-03-03
-------+-----+----------
egg    |4    |2015-10-12

购买:

user|product|date
----+-------+----------
John|apple  |2014-03-02
----+-------+----------
John|apple  |2014-06-03
----+-------+----------
John|egg    |2014-08-13
----+-------+----------
John|egg    |2016-08-13

我需要的是与此类似的表:

name|product|purchase date |price date|price
----+-------+--------------+----------+-----
John|apple  |2014-03-02    |2014-03-01|10
----+-------+--------------+----------+-----
John|apple  |2014-06-03    |2014-05-02|20
----+-------+--------------+----------+-----
John|egg    |2014-08-13    |2014-08-13|2
----+-------+--------------+----------+-----
John|egg    |2016-08-13    |2015-10-12|4

或“今天产品的价格是多少”。价格是根据products表中的日期计算得出的。 在实际的数据库上,我尝试使用类似于以下内容的东西:

SELECT name, product, pu.date, pp.date, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp
ON pu.date = (
              SELECT date
              FROM product_prices
              ORDER BY date DESC LIMIT 1);

但是我要么只获得表格的左部分(用(空)代替产品日期和价格),要么用价格和日期的所有组合获得很多行。

4 个答案:

答案 0 :(得分:1)

我建议将product_prices表更改为使用daterange列(或至少使用start_dateend_date)。

您可以使用排除约束来确保您永远不会对一种产品有重叠范围,并使用插入触发器“关闭”“当前”价格并为新插入的价格创建一个新的无边界范围。

daterange可以有效地建立索引,并且有了它,查询就变得很容易:

SELECT name, product, pu.date, pp.valid_during, pp.price
FROM purchases AS pu
  LEFT JOIN product_prices AS pp ON pu.date <@ pp.valid_during

(假设范围列名为valid_during


但是,仅当乘积是整数(不是varchar)时,排除约束才有效-但是我想您的真实product_purchases表无论如何还是要对某些乘积表使用外键(这是整数)。

新的表定义可能类似于:

create table purchase_prices
(
   product_id    integer       not null references products,
   price         numeric(16,4) not null,
   valid_during  daterange not null
);

以及防止范围重叠的约束:

alter table purchase_prices
  add constraint check_price_range
  exclude using gist (product_id with =, valid_during with &&);

约束需要扩展btree_gist

与往常一样,提高查询速度是有代价的,在这种情况下,这是GiST索引的较高维护成本。您需要运行一些测试,以查看更简单(且可能更快)的查询是否超过purchase_prices上较慢的插入性能。

答案 1 :(得分:0)

您可以尝试这样的方法,尽管我确信有更好的方法:

with diffs as (
  select
      a.*,
      b."date" as bdate,
      b.price,
      b."date" - a."date" as diffdays,
      row_number() over (
        partition by "user", a."product", a."date"
        order by "user", a."product", a."date", b."date" - a."date" desc
      ) as sr
  from purchases a
  inner join product_prices b on a.product = b.product
  where b."date" - a."date" < 1
)
select
    "user" as "name",
    product,
    "date" as "purchase date",
    bdate as "price date",
    price
from diffs
where sr = 1

示例:https://www.db-fiddle.com/f/dwQ9EXmp1SdpNpxyV1wc6M/0

说明

我试图同时加入两个表,以查找购买日期和价格之间的差异,然后按购买前的最接近日期对其进行排名。排名1将最接近日期。然后,提取等级为1的数据。

答案 2 :(得分:0)

这是使用日期范围的好地方!我们知道价格范围的开始日期,我们可以使用窗口函数来获取下一个日期。在这一点上,很容易确定任何一天的价格。

with price_ranges as 
    (select product, 
            price, 
            date as price_date, 
            daterange(date, lead(date, 1) 
               OVER (partition by product order by date), '[)'
            ) as valid_price_range from product_prices
     )
select "user" as name, 
       purchases.product, 
       purchases.date, 
       price_date, 
       price
from purchases
join price_ranges on purchases.product = price_ranges.product
and purchases.date <@ price_ranges.valid_price_range
order by purchases.date;

答案 3 :(得分:0)

非常仔细地查看标量子查询。它不关联回外部查询。换句话说,它每次都会返回相同的结果:product_prices表中的最新日期。期。考虑上下文之外的查询:

SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1

它有两个问题:

  1. 它将为连接中的每一行返回2015-10-12,最终,该日期未购买任何东西,因此为空。
  2. 您最接近的估计是日期相等。除非您在每个日期的每个产品都有一个product_prices行,否则您总会错过。 “最近”表示距离和排名。
WITH close_prices_by_purchase AS (
    SELECT
      p.user,
      p.product,
      p.date pp.date,
      pp.price,
      row_number() over (partition by pp.product, order by pp.date desc) as distance -- calculate distance between purchase date and price date
    FROM purchases AS p
    INNER JOIN product_prices AS pp on pp.product = p.product
    WHERE pp.date < p.date
)
SELECT user as name, product, pu.date as purchase_date, pp.date as price_date, price
FROM close_prices_by_purchase AS cpbp
WHERE distance = 1; -- shortest distance