合并来自多个表的最新条目

时间:2015-03-29 20:02:45

标签: sql postgresql aggregate-functions greatest-n-per-group crosstab

我有一个master表,其中包含许多ID:

ID  ...
0   ...
1   ...

多个表格(例如vtbl1vtbl2vtbl3),其中包含master的外键,时间戳和值:

ID  Timestamp    Value
0   01/01/01..   2
1   01/01/02..   7
0   01/01/03..   5

我想为ID中的每个master添加一个或多个条目,其中包含一个条目(如果没有条目,则为null),其中包含每个v...表中的最新条目(按时间戳分组):

ID  Timestamp    vtbl1.Value   vtbl2.Value   vtbl3.value
0   01/01/03..   5             2
0   01/01/01..                               4
1   01/01/02..   7             4             9

我确信这很简单,但我的SQL生锈了,而且我一直在圈子里。任何帮助将不胜感激。

澄清

这些值来自一个或多个能够读取一个或多个值的传感器。因此,value的每个值表中的最新ID将被视为ID的当前系统状态。如果时间戳匹配,则认为它们是一次更新。

我需要每个ID所需的最小更新集,以便为当前状态提供完整的数据集。

值也可以是不同类型

3 个答案:

答案 0 :(得分:1)

如果我正确理解您的问题,一种选择是使用条件聚合和union all

select id, timestamp, 
       max(case when tbl = 'tbl1' then value end) t1value,
       max(case when tbl = 'tbl2' then value end) t2value,
       max(case when tbl = 'tbl3' then value end) t3value
from (
    select id, timestamp, value, 'tbl1' tbl
    from tbl1
    union all
    select id, timestamp, value, 'tbl2' tbl
    from tbl2
    union all
    select id, timestamp, value, 'tbl3' tbl
    from tbl3
) t
group by id, timestamp

或者,如果每个id有多个记录,并且您希望value每个timestamp最高,则可以在子查询中包含row_number()

select id, timestamp, 
       max(case when tbl = 'tbl1' then value end) t1value,
       max(case when tbl = 'tbl2' then value end) t2value,
       max(case when tbl = 'tbl3' then value end) t3value
from (
    select id, timestamp, value, 'tbl1' tbl,
        row_number() over (partition by id order by timestamp desc) rn
    from tbl1
    union all
    select id, timestamp, value, 'tbl2' tbl,
        row_number() over (partition by id order by timestamp desc) rn
    from tbl2
    union all
    select id, timestamp, value, 'tbl3' tbl,
        row_number() over (partition by id order by timestamp desc) rn
    from tbl3
) t
where rn = 1
group by id, timestamp

如果每个子表中的max(timestamp)值不相同,这可能会变得困难。你当时join在哪一个?

答案 1 :(得分:0)

select m.*, v1.value as t1_val, v2.value as t2_val, v3.value as t3_val
  from master m
  left join (select x.*
               from vtbl1 x
               join (select id, max(timestamp) as last_ts
                      from vtbl1
                     group by id) y
                 on x.id = y.id
                and x.timestamp = y.last_ts) v1
    on m.id = v1.id
  left join (select x.*
               from vtbl2 x
               join (select id, max(timestamp) as last_ts
                      from vtbl2
                     group by id) y
                 on x.id = y.id
                and x.timestamp = y.last_ts) v2
    on m.id = v2.id
  left join (select x.*
               from vtbl3 x
               join (select id, max(timestamp) as last_ts
                      from vtbl3
                     group by id) y
                 on x.id = y.id
                and x.timestamp = y.last_ts) v3
    on m.id = v3.id

答案 2 :(得分:0)

最快的查询技术取决于值的分布。 DISTINCT ON在Postgres中是一个简单的解决方案,非常适合每个子表中每id个值。但是根据您的描述猜测,我希望每id很多行,所以我建议使用LATERAL联接的解决方案。需要Postgres 9.3 +:

对于你已经不那么简单的案例,还有一个复杂因素:

  

值也可以是不同的类型

备选方案1

将所有值投射到text。每种数据类型都可以转换为text

基本查询

SELECT m.id, v.timestamp, 1 AS tbl, v.value  -- simple int as table id
FROM   master m
     , LATERAL (
   SELECT timestamp, value::text  -- cast to text
   FROM   vtbl1
   WHERE  id = m.id  -- lateral reference
   ORDER  BY timestamp DESC NULLS LAST
   LIMIT  1
   ) v

UNION ALL
SELECT m.id, v.timestamp, 2 AS tbl, v.value  -- ascending without gaps
FROM   master m
     , LATERAL (
   SELECT timestamp, value::text
   FROM   vtbl2
   WHERE  id = m.id
   ORDER  BY timestamp DESC NULLS LAST
   LIMIT  1
   ) v

UNION ALL
SELECT m.id, v.timestamp, 3 AS tbl, value
FROM  ...
;

每个子表的(id, timestamp)索引都是快速的。最好使用此表单(添加value仅在您获得index-only scans时才有用):

CREATE INDEX vtbl1_combo_idx ON vtbl1 (id, timestamp DESC NULLS LAST, value)

1a上。聚合(伪交叉表)

要根据需要进行格式化,请在Postgres 9.3或更早版本中使用CASE表达式的聚合函数(如demonstrated by @sgeddes)或(更好)Postgres 9.4 +中的新聚合FILTER子句:

SELECT id, timestamp
     , max(value) FILTER (WHERE tbl = 1) AS val1
     , max(value) FILTER (WHERE tbl = 2) AS val2
     , ...
FROM ( <query frm above> ) t
GROUP  BY 1, 2;

1b中。交叉表

实际交叉制表(在其他RDBMS中也称为“枢轴”)应该快得多。您需要安装额外的模块tablefunc,以下说明。

这里有一个特殊困难:我们有一个复合“行名称”(id, timestamp),但该函数需要一个单个列作为行名。因此我们用row_number()替换,但不在结果中显示该代理键:

SELECT id, timestamp, val1, val2, val3, ...
 -- normally SELECT * is enough; explicit list to filter rn
FROM  crosstab(
    $$
    SELECT row_number() OVER (ORDER BY id, timestamp DESC NULLS LAST) AS rn
         , id, timestamp, tbl, value
    FROM  ( <query from above> ) t
    ORDER  BY 1
    $$
  , 'SELECT generate_series(1,3)'  -- replace 3 with highest table nr.
    ) AS ct (
    rn int, id int, timestamp date
  , val1 text, val2 text, val3 text, ...);

密切相关:

相关基础:

备选方案2

简单,但可能同样快,并保留原始数据类型:

SELECT id, timestamp
     , max(val1) AS val1, max(val2) AS val2, max(val3) AS val3, ...
FROM  (
   SELECT m.id, v.timestamp
        , v.value AS val1, NULL::int AS val2, NULL::numeric AS val3, ...   
          -- list all values with actual data type
   FROM   master m
        , LATERAL (
      SELECT timestamp, value
      FROM   vtbl1
      WHERE  id = m.id
      ORDER  BY timestamp DESC NULLS LAST
      LIMIT  1
      ) v

   UNION ALL
   SELECT m.id, v.timestamp
        , NULL, v.value, NULL, ...  -- column names & data types defined in first SELECT
   FROM   master m
        , LATERAL (
      SELECT timestamp, value
      FROM   vtbl2
      WHERE  id = m.id
      ORDER  BY timestamp DESC NULLS LAST
      LIMIT  1
      ) v

   UNION ALL
   SELECT m.id, v.timestamp
        , NULL, NULL, v.value, ...
   FROM  ...
   ) t
GROUP  BY 1, 2
ORDER  BY 1, 2;

除此之外:切勿使用基本类型名称或reserved words(在标准SQL中),如timestamp作为标识符。