Question

我正在建模（在Postgres 9.6.1 / postGIS 2.3.1中）供应商提供的本地服务预订系统：

create table supplier (
    id                serial primary key,
    name              text not null check (char_length(title) < 280),
    type              service_type,
    duration          interval,
    ...
    geo_position      geography(POINT,4326)
    ...
);

每个供应商在他/她可以预订时都会保留带有时间段的日历：

create table timeslot (
    id                 serial primary key,
    supplier_id        integer not null references supplier(id),
    slot               tstzrange not null,

    constraint supplier_overlapping_timeslot_not_allowed
    exclude using gist (supplier_id with =, slot with &&)
);

当客户想要知道哪些附近的供应商可以在特定时间预订时，我创建了一个视图和功能：

create view supplier_slots as
    select
        supplier.name, supplier.type, supplier.geo_position, supplier.duration, ...
        timeslot.slot
    from
        supplier, timeslot
    where
        supplier.id = timeslot.supplier_id;


create function find_suppliers(wantedType service_type, near_latitude text, near_longitude text, at_time timestamptz)
returns setof supplier_slots as $$
declare
    nearpoint geography;
begin
    nearpoint := ST_GeographyFromText('SRID=4326;POINT(' || near_latitude || ' ' || near_longitude || ')');
    return query
        select * from supplier_slots
        where type = wantedType
            and tstzrange(at_time, at_time + duration) <@ slot
        order by ST_Distance( nearpoint, geo_position )
        limit 100;
end;
$$ language plpgsql;

这一切都很有效。

现在，对于那些在请求的时间没有可预订时间段的供应商，我想在请求的at_time之前和之后找到最近的可用时段，按距离排序。

这让我的思绪旋转了一点，我找不到合适的操作员给我最近的tsrange。

有关最聪明方法的任何想法吗？

Answer 1

解决方案取决于您想要的完全定义。

模式

我建议使用这些略微调整的表定义，以使任务更简单，强制完整性并提高性能：

CREATE TABLE supplier (
   supplier_id  serial PRIMARY KEY,
   supplier     text NOT NULL CHECK (length(title) < 280),
   type         service_type,
   duration     interval,
   geo_position geography(POINT,4326)
);

CREATE TABLE timeslot (
   timeslot_id  serial PRIMARY KEY,
   supplier_id  integer NOT NULL -- references supplier(id),
   slot_a       timestamptz NOT NULL,
   slot_z       timestamptz NOT NULL,
   CONSTRAINT   timeslot_range_valid CHECK (slot_a < slot_z)
   CONSTRAINT   timeslot_no_overlapping
     EXCLUDE USING gist (supplier_id WITH =, tstzrange(slot_a, slot_z) WITH &&)
);

CREATE INDEX timeslot_slot_z ON timeslot (supplier_id, slot_z);
CREATE INDEX supplier_geo_position_gist ON supplier USING gist (geo_position);

保存两个timestamptz列slot_a和slot_z而不是tstzrange列slot - 并相应地调整约束。这会自动将所有范围视为默认的包含更低和独占上限 - 这可以避免角落错误/头痛。

附带权益：timestamptz仅2个tstzrange而不是25个字节（32个带填充）仅16个字节。
您在slot上可能遇到的所有查询都继续使用tstzrange(slot_a, slot_z)作为替补。
在(supplier_id, slot_z)上为手头的查询添加索引和supplier.geo_position上的空间索引（您可能已经拥有）。

根据type中的数据分布，查询中常见类型的几个部分索引可能有助于提高性能：
```
CREATE INDEX supplier_geo_type_foo_gist ON supplier USING gist (geo_position)
WHERE supplier = 'foo'::service_type;
```

查询/功能

此查询查找 X最近的供应商，他们提供正确的service_type （示例中为100），每个供应商都有一个最接近的匹配时段（已定义）到距离开始的时间距离）。我把它与实际匹配的插槽相结合，这可能是您需要的也可能不是。

CREATE FUNCTION f_suppliers_nearby(_type service_type, _lat text, _lon text, at_time timestamptz)
  RETURNS TABLE (supplier_id  int
               , name         text
               , duration     interval
               , geo_position geography(POINT,4326)
               , distance     float 
               , timeslot_id  int
               , slot_a       timestamptz
               , slot_z       timestamptz
               , time_dist    interval
   ) AS
$func$
   WITH sup_nearby AS (  -- find matching or later slot
      SELECT s.id, s.name, s.duration, s.geo_position
           , ST_Distance(ST_GeographyFromText('SRID=4326;POINT(' || _lat || ' ' || _lon || ')')
                          , geo_position) AS distance
           , t.timeslot_id, t.slot_a, t.slot_z
           , CASE WHEN t.slot_a IS NOT NULL
                  THEN GREATEST(t.slot_a - at_time, interval '0') END AS time_dist
      FROM   supplier s
      LEFT   JOIN LATERAL (
         SELECT *
         FROM   timeslot
         WHERE  supplier_id = supplier_id
         AND    slot_z > at_time + s.duration  -- excl. upper bound
         ORDER  BY slot_z
         LIMIT  1
         ) t ON true
      WHERE  s.type = _type
      ORDER  BY s.distance
      LIMIT  100
      )
   SELECT *
   FROM  (
      SELECT DISTINCT ON (supplier_id) *  -- 1 slot per supplier
      FROM  (
         TABLE sup_nearby  -- matching or later slot

         UNION ALL         -- earlier slot
         SELECT s.id, s.name, s.duration, s.geo_position
              , s.distance
              , t.timeslot_id, t.slot_a, t.slot_z
              , GREATEST(at_time - t.slot_a, interval '0') AS time_dist
         FROM   sup_nearby s
         CROSS  JOIN LATERAL (  -- this time CROSS JOIN!
            SELECT *
            FROM   timeslot
            WHERE  supplier_id = s.supplier_id
            AND    slot_z <= at_time  -- excl. upper bound
            ORDER  BY slot_z DESC
            LIMIT  1
            ) t
         WHERE  s.time_dist IS DISTINCT FROM interval '0'  -- exact matches are done
         ) sub
      ORDER  BY supplier_id, time_dist  -- pick temporally closest slot per supplier
   ) sub
   ORDER  BY time_dist, distance;  -- matches first, ordered by distance; then misses, ordered by time distance

$func$  LANGUAGE sql;

我没有使用您的视图supplier_slots而是针对性能进行了优化。视图可能仍然很方便。为了向后兼容，您可以包含tstzrange(slot_a, slot_z) AS slot。

找到最近的100家供应商的基本查询是教科书＆＃34; K最近邻居＆＃34;问题。 GiST指数适用于此。相关：

How do I query all rows within a 5-mile radius of my coordinates?

附加任务（找到时间上最接近的槽）可以分为两个任务：找到下一个较高行和下一个较低行。解决方案的核心功能是两个带有ORDER BY slot_z LIMIT 1和ORDER BY slot_z DESC LIMIT 1的子查询，这会导致两次非常快速的索引扫描。

我将第一个与找到实际匹配相结合，这是一个（智能，我认为）优化，但可能会分散实际的解决方案。

Postgres：如何从范围外的时间戳找到最近的tsrange？

1 个答案:

模式

查询/功能