我有一个master
表,其中包含许多ID:
ID ...
0 ...
1 ...
多个表格(例如vtbl1
,vtbl2
,vtbl3
),其中包含master
的外键,时间戳和值:
ID Timestamp Value
0 01/01/01.. 2
1 01/01/02.. 7
0 01/01/03.. 5
我想为ID
中的每个master
添加一个或多个条目,其中包含一个条目(如果没有条目,则为null),其中包含每个v...
表中的最新条目(按时间戳分组):
ID Timestamp vtbl1.Value vtbl2.Value vtbl3.value
0 01/01/03.. 5 2
0 01/01/01.. 4
1 01/01/02.. 7 4 9
我确信这很简单,但我的SQL生锈了,而且我一直在圈子里。任何帮助将不胜感激。
这些值来自一个或多个能够读取一个或多个值的传感器。因此,value
的每个值表中的最新ID
将被视为ID
的当前系统状态。如果时间戳匹配,则认为它们是一次更新。
我需要每个ID
所需的最小更新集,以便为当前状态提供完整的数据集。
值也可以是不同类型。
答案 0 :(得分:1)
如果我正确理解您的问题,一种选择是使用条件聚合和union all
:
select id, timestamp,
max(case when tbl = 'tbl1' then value end) t1value,
max(case when tbl = 'tbl2' then value end) t2value,
max(case when tbl = 'tbl3' then value end) t3value
from (
select id, timestamp, value, 'tbl1' tbl
from tbl1
union all
select id, timestamp, value, 'tbl2' tbl
from tbl2
union all
select id, timestamp, value, 'tbl3' tbl
from tbl3
) t
group by id, timestamp
或者,如果每个id
有多个记录,并且您希望value
每个timestamp
最高,则可以在子查询中包含row_number()
:
select id, timestamp,
max(case when tbl = 'tbl1' then value end) t1value,
max(case when tbl = 'tbl2' then value end) t2value,
max(case when tbl = 'tbl3' then value end) t3value
from (
select id, timestamp, value, 'tbl1' tbl,
row_number() over (partition by id order by timestamp desc) rn
from tbl1
union all
select id, timestamp, value, 'tbl2' tbl,
row_number() over (partition by id order by timestamp desc) rn
from tbl2
union all
select id, timestamp, value, 'tbl3' tbl,
row_number() over (partition by id order by timestamp desc) rn
from tbl3
) t
where rn = 1
group by id, timestamp
如果每个子表中的max(timestamp)值不相同,这可能会变得困难。你当时join
在哪一个?
答案 1 :(得分:0)
select m.*, v1.value as t1_val, v2.value as t2_val, v3.value as t3_val
from master m
left join (select x.*
from vtbl1 x
join (select id, max(timestamp) as last_ts
from vtbl1
group by id) y
on x.id = y.id
and x.timestamp = y.last_ts) v1
on m.id = v1.id
left join (select x.*
from vtbl2 x
join (select id, max(timestamp) as last_ts
from vtbl2
group by id) y
on x.id = y.id
and x.timestamp = y.last_ts) v2
on m.id = v2.id
left join (select x.*
from vtbl3 x
join (select id, max(timestamp) as last_ts
from vtbl3
group by id) y
on x.id = y.id
and x.timestamp = y.last_ts) v3
on m.id = v3.id
答案 2 :(得分:0)
最快的查询技术取决于值的分布。 DISTINCT ON
在Postgres中是一个简单的解决方案,非常适合每个子表中每id
个值。但是根据您的描述猜测,我希望每id
行很多行,所以我建议使用LATERAL
联接的解决方案。需要Postgres 9.3 +:
对于你已经不那么简单的案例,还有一个复杂因素:
值也可以是不同的类型
将所有值投射到text
。每种数据类型都可以转换为text
。
SELECT m.id, v.timestamp, 1 AS tbl, v.value -- simple int as table id
FROM master m
, LATERAL (
SELECT timestamp, value::text -- cast to text
FROM vtbl1
WHERE id = m.id -- lateral reference
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp, 2 AS tbl, v.value -- ascending without gaps
FROM master m
, LATERAL (
SELECT timestamp, value::text
FROM vtbl2
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp, 3 AS tbl, value
FROM ...
;
每个子表的(id, timestamp)
索引都是快速的。最好使用此表单(添加value
仅在您获得index-only scans时才有用):
CREATE INDEX vtbl1_combo_idx ON vtbl1 (id, timestamp DESC NULLS LAST, value)
要根据需要进行格式化,请在Postgres 9.3或更早版本中使用CASE
表达式的聚合函数(如demonstrated by @sgeddes)或(更好)Postgres 9.4 +中的新聚合FILTER
子句:
SELECT id, timestamp
, max(value) FILTER (WHERE tbl = 1) AS val1
, max(value) FILTER (WHERE tbl = 2) AS val2
, ...
FROM ( <query frm above> ) t
GROUP BY 1, 2;
实际交叉制表(在其他RDBMS中也称为“枢轴”)应该快得多。您需要安装额外的模块tablefunc
,以下说明。
这里有一个特殊困难:我们有一个复合“行名称”(id, timestamp)
,但该函数需要一个单个列作为行名。因此我们用row_number()
替换,但不在结果中显示该代理键:
SELECT id, timestamp, val1, val2, val3, ...
-- normally SELECT * is enough; explicit list to filter rn
FROM crosstab(
$$
SELECT row_number() OVER (ORDER BY id, timestamp DESC NULLS LAST) AS rn
, id, timestamp, tbl, value
FROM ( <query from above> ) t
ORDER BY 1
$$
, 'SELECT generate_series(1,3)' -- replace 3 with highest table nr.
) AS ct (
rn int, id int, timestamp date
, val1 text, val2 text, val3 text, ...);
密切相关:
相关基础:
简单,但可能同样快,并保留原始数据类型:
SELECT id, timestamp
, max(val1) AS val1, max(val2) AS val2, max(val3) AS val3, ...
FROM (
SELECT m.id, v.timestamp
, v.value AS val1, NULL::int AS val2, NULL::numeric AS val3, ...
-- list all values with actual data type
FROM master m
, LATERAL (
SELECT timestamp, value
FROM vtbl1
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp
, NULL, v.value, NULL, ... -- column names & data types defined in first SELECT
FROM master m
, LATERAL (
SELECT timestamp, value
FROM vtbl2
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp
, NULL, NULL, v.value, ...
FROM ...
) t
GROUP BY 1, 2
ORDER BY 1, 2;
除此之外:切勿使用基本类型名称或reserved words(在标准SQL中),如timestamp
作为标识符。