Postgres左边两个相同的表之间的连接非常慢

时间:2014-03-28 12:23:50

标签: sql postgresql

您好我无法理解性能问题。 我有两个相同的结构化表sensor_values和sensor_values_cleaned 结构是

CREATE TABLE sensor_values
(
  ts timestamp with time zone NOT NULL,
  value double precision NOT NULL DEFAULT 'NaN'::real,
  sensor_id integer NOT NULL,
  CONSTRAINT sensor_values_sensor_id_fkey FOREIGN KEY (sensor_id)
      REFERENCES sensors (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT timestamp_sensor_index UNIQUE (ts, sensor_id)
)

两个表的ts和sensor_id字段都有索引。 (这些表实际上是按年度分割的许多分区表)

问题查询

SELECT MIN(s1.ts)::timestamptz(0) AS min_time, AVG(s1.value), AVG(s2.value)
FROM sensor_values s1 LEFT JOIN sensor_values_cleaned s2 USING (sensor_id,ts)
 WHERE s1.ts::timestamptz >= '2011-02-25T20:25:07.192132+00:00'::timestamptz AND s1.ts::timestamptz <= '2012-12-31T23:59:59.999999'::timestamp 
 AND s1.sensor_id IN (904 ) GROUP BY s1.ts::timestamptz ORDER BY 1 DESC

我们的想法是获取每个sensor_id的原始数据和清理数据。 我的原始查询对此数据进行了调整,但我删除了它,因为该部分并不慢。

sensor_id 904 has 88000 rows in sensor_values and 0 in sensor_values_cleaned

此查询在多次运行后需要约1300毫秒。 问题是当我添加到IN子句

SELECT MIN(s1.ts)::timestamptz(0) AS min_time, AVG(s1.value), AVG(s2.value)
FROM sensor_values s1 LEFT JOIN sensor_values_cleaned s2 USING (sensor_id,ts)
 WHERE s1.ts::timestamptz >= '2011-02-25T20:25:07.192132+00:00'::timestamptz AND s1.ts::timestamptz <= '2012-12-31T23:59:59.999999'::timestamp 
 AND s1.sensor_id IN (904, 967 ) GROUP BY s1.ts::timestamptz ORDER BY 1 DESC

使用缓存需要15秒。第一次运行需要40个!

sensor id 967 has 69600 rows in sensor_values and 0 in sensor_values_cleaned.

我做过VACUUM ANALYZE

任何人都知道这个问题或者建议吗?

谢谢

我的查询分析位于

https://dl.dropboxusercontent.com/u/189370/query_analyze.txt

1 个答案:

答案 0 :(得分:0)

以下条件中的最后一次投射(::timestamp)可能会剥夺计划员限制表格搜索的机会

WHERE 
    s1.ts::timestamptz >= '2011-02-25T20:25:07.192132+00:00'::timestamptz AND  
    s1.ts::timestamptz <= '2012-12-31T23:59:59.999999'::timestamp 

尝试将其更改为

s1.ts >= '2011-02-25t20:25:07.192132+00:00'::timestamptz and 
s1.ts < '2013-01-01'::timestamptz

然后比较并发布新计划。此外,无需投放ts,因为它已经timestamp with time zone

更新

这并不能解释行为,但可能会更快:

select
    sensor_id,
    min(ts)::timestamptz(0) as min_time,
    avg(s1.value),
    avg(s2.value)
from
    (
        select ts, "value", sensor_id
        from sensor_values
        where
            ts >= '2011-02-25t20:25:07.192132+00:00'::timestamptz and ts < '2013-01-01'::timestamptz
            and sensor_id in (904)
    ) s1
    left join (
        select ts, "value", sensor_id
        from sensor_values_cleaned
        where
            ts >= '2011-02-25t20:25:07.192132+00:00'::timestamptz and ts < '2013-01-01'::timestamptz
            and sensor_id in (904)
    ) s2 using (sensor_id,ts)
group by sensor_id
order by 1 desc