使用PostgreSQL

时间:2016-05-31 18:30:05

标签: arrays performance postgresql function where-clause

我的功能存在性能问题,我想了解原因。我不习惯阅读EXPLAIN结果,所以我要求你的建议。

我创建了一个根据搜索条件返回id的函数:

scientific.selectChannels(_measurelabel := '{O3,SO2,NO2}', _siteid:='{41R001}', _networkid := '{RTU}')

它返回一个数组,运行时间不到60毫秒:

'{120,122,125}'

当我执行此查询时,答案非常缓慢:

WITH
D AS (
SELECT
    DISTINCT ON (ChannelId)
    ChannelId
   ,TimeValue
   ,FloatValue
FROM
    datastore.inline
WHERE
    -- ANY('{120,122,125}')
    channelId =  ANY(scientific.selectChannels(_measurelabel := '{O3,SO2,NO2}', _siteid:='{41R001}', _networkid := '{RTU}'))
    AND QualityCodeId = 6
    AND GranulityId = 1
    AND TimeValue >= date_trunc('day', now()::TIMESTAMP) AND TimeValue < date_trunc('day', now()::TIMESTAMP) + '1 day'::INTERVAL
ORDER BY
    ChannelId
   ,TimeValue DESC
)

SELECT
    D.*
   ,CM.ChannelUnitsId
   ,CM.MeasureLabel
   ,CM.SiteId
   ,CM.SiteLabel
FROM
    D JOIN scientific.channelMetadata AS CM ON (D.ChannelId = CM.ChannelId)

查询刨床返回:

"Hash Join  (cost=12320.23..12331.54 rows=3 width=332) (actual time=9150.405..9150.463 rows=3 loops=1)"
"  Hash Cond: (s.channelid = d.channelid)"
"  CTE d"
"    ->  Unique  (cost=12191.27..12192.55 rows=1 width=20) (actual time=9146.634..9146.647 rows=3 loops=1)"
"          ->  Sort  (cost=12191.27..12191.91 rows=256 width=20) (actual time=9146.633..9146.644 rows=108 loops=1)"
"                Sort Key: inline.channelid, inline.timevalue DESC"
"                Sort Method: quicksort  Memory: 33kB"
"                ->  Bitmap Heap Scan on inline  (cost=197.79..12181.03 rows=256 width=20) (actual time=6729.253..9146.572 rows=108 loops=1)"
"                      Recheck Cond: ((timevalue >= date_trunc('day'::text, (now())::timestamp without time zone)) AND (timevalue < (date_trunc('day'::text, (now())::timestamp without time zone) + '1 day'::interval)))"
"                      Filter: ((qualitycodeid = 6) AND (granulityid = 1) AND (channelid = ANY (scientific.selectchannels('{41R001}'::text[], '{}'::text[], '{O3,SO2,NO2}'::text[], NULL::text[], '{RTU}'::text[], NULL::text[], NULL::text[], false))))"
"                      Rows Removed by Filter: 6132"
"                      Heap Blocks: exact=104"
"                      ->  Bitmap Index Scan on idx_inline_timevalue  (cost=0.00..197.73 rows=5729 width=0) (actual time=1.016..1.016 rows=6240 loops=1)"
"                            Index Cond: ((timevalue >= date_trunc('day'::text, (now())::timestamp without time zone)) AND (timevalue < (date_trunc('day'::text, (now())::timestamp without time zone) + '1 day'::interval)))"
"  ->  Sort  (cost=127.65..129.39 rows=694 width=450) (actual time=3.714..3.740 rows=694 loops=1)"
"        Sort Key: s.siteid, s.networkid, s.measureid, s.jokerid"
"        Sort Method: quicksort  Memory: 118kB"
"        CTE s"
"          ->  Hash Join  (cost=14.44..81.02 rows=694 width=182) (actual time=0.117..1.219 rows=694 loops=1)"
"                Hash Cond: ((c.unitsid)::text = (u.id)::text)"
"                ->  Hash Join  (cost=12.79..66.35 rows=694 width=150) (actual time=0.098..0.866 rows=694 loops=1)"
"                      Hash Cond: ((c.networkid)::text = (n.id)::text)"
"                      ->  Hash Join  (cost=11.43..55.45 rows=694 width=137) (actual time=0.088..0.644 rows=694 loops=1)"
"                            Hash Cond: ((c.siteid)::text = (s_1.id)::text)"
"                            ->  Hash Join  (cost=5.15..39.63 rows=694 width=122) (actual time=0.052..0.406 rows=694 loops=1)"
"                                  Hash Cond: ((c.measureid)::text = (m.id)::text)"
"                                  ->  Seq Scan on channel c  (cost=0.00..24.94 rows=694 width=97) (actual time=0.001..0.164 rows=694 loops=1)"
"                                  ->  Hash  (cost=3.40..3.40 rows=140 width=31) (actual time=0.047..0.047 rows=140 loops=1)"
"                                        Buckets: 1024  Batches: 1  Memory Usage: 16kB"
"                                        ->  Seq Scan on measure m  (cost=0.00..3.40 rows=140 width=31) (actual time=0.002..0.023 rows=140 loops=1)"
"                            ->  Hash  (cost=3.90..3.90 rows=190 width=22) (actual time=0.032..0.032 rows=101 loops=1)"
"                                  Buckets: 1024  Batches: 1  Memory Usage: 14kB"
"                                  ->  Seq Scan on site s_1  (cost=0.00..3.90 rows=190 width=22) (actual time=0.002..0.014 rows=101 loops=1)"
"                      ->  Hash  (cost=1.16..1.16 rows=16 width=17) (actual time=0.007..0.007 rows=18 loops=1)"
"                            Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                            ->  Seq Scan on network n  (cost=0.00..1.16 rows=16 width=17) (actual time=0.001..0.002 rows=18 loops=1)"
"                ->  Hash  (cost=1.29..1.29 rows=29 width=36) (actual time=0.013..0.013 rows=29 loops=1)"
"                      Buckets: 1024  Batches: 1  Memory Usage: 10kB"
"                      ->  Seq Scan on units u  (cost=0.00..1.29 rows=29 width=36) (actual time=0.003..0.009 rows=29 loops=1)"
"        ->  CTE Scan on s  (cost=0.00..13.88 rows=694 width=450) (actual time=0.119..1.839 rows=694 loops=1)"
"  ->  Hash  (cost=0.02..0.02 rows=1 width=20) (actual time=9146.650..9146.650 rows=3 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"        ->  CTE Scan on d  (cost=0.00..0.02 rows=1 width=20) (actual time=9146.636..9146.649 rows=3 loops=1)"
"Planning time: 1.931 ms"
"Execution time: 9150.594 ms"

需要大约10秒才能获取3行。 如果我用硬编码的结果'{120,122,125}'替换函数调用,则需要不到100毫秒。查询刨床然后返回:

"Hash Join  (cost=376.05..387.36 rows=3 width=332) (actual time=7.483..7.595 rows=3 loops=1)"
"  Hash Cond: (s.channelid = d.channelid)"
"  CTE d"
"    ->  Unique  (cost=247.94..248.37 rows=1 width=20) (actual time=0.172..0.194 rows=3 loops=1)"
"          ->  Sort  (cost=247.94..248.15 rows=87 width=20) (actual time=0.172..0.182 rows=108 loops=1)"
"                Sort Key: inline.channelid, inline.timevalue DESC"
"                Sort Method: quicksort  Memory: 33kB"
"                ->  Index Scan using uq_inline on inline  (cost=0.44..245.13 rows=87 width=20) (actual time=0.036..0.109 rows=108 loops=1)"
"                      Index Cond: ((qualitycodeid = 6) AND (channelid = ANY ('{120,122,125}'::integer[])) AND (granulityid = 1) AND (timevalue >= date_trunc('day'::text, (now())::timestamp without time zone)) AND (timevalue < (date_trunc('day'::text, (now( (...)"
"  ->  Sort  (cost=127.65..129.39 rows=694 width=450) (actual time=7.215..7.269 rows=694 loops=1)"
"        Sort Key: s.siteid, s.networkid, s.measureid, s.jokerid"
"        Sort Method: quicksort  Memory: 118kB"
"        CTE s"
"          ->  Hash Join  (cost=14.44..81.02 rows=694 width=182) (actual time=0.225..2.351 rows=694 loops=1)"
"                Hash Cond: ((c.unitsid)::text = (u.id)::text)"
"                ->  Hash Join  (cost=12.79..66.35 rows=694 width=150) (actual time=0.190..1.661 rows=694 loops=1)"
"                      Hash Cond: ((c.networkid)::text = (n.id)::text)"
"                      ->  Hash Join  (cost=11.43..55.45 rows=694 width=137) (actual time=0.173..1.279 rows=694 loops=1)"
"                            Hash Cond: ((c.siteid)::text = (s_1.id)::text)"
"                            ->  Hash Join  (cost=5.15..39.63 rows=694 width=122) (actual time=0.106..0.817 rows=694 loops=1)"
"                                  Hash Cond: ((c.measureid)::text = (m.id)::text)"
"                                  ->  Seq Scan on channel c  (cost=0.00..24.94 rows=694 width=97) (actual time=0.002..0.306 rows=694 loops=1)"
"                                  ->  Hash  (cost=3.40..3.40 rows=140 width=31) (actual time=0.096..0.096 rows=140 loops=1)"
"                                        Buckets: 1024  Batches: 1  Memory Usage: 16kB"
"                                        ->  Seq Scan on measure m  (cost=0.00..3.40 rows=140 width=31) (actual time=0.005..0.040 rows=140 loops=1)"
"                            ->  Hash  (cost=3.90..3.90 rows=190 width=22) (actual time=0.063..0.063 rows=101 loops=1)"
"                                  Buckets: 1024  Batches: 1  Memory Usage: 14kB"
"                                  ->  Seq Scan on site s_1  (cost=0.00..3.90 rows=190 width=22) (actual time=0.005..0.037 rows=101 loops=1)"
"                      ->  Hash  (cost=1.16..1.16 rows=16 width=17) (actual time=0.011..0.011 rows=18 loops=1)"
"                            Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                            ->  Seq Scan on network n  (cost=0.00..1.16 rows=16 width=17) (actual time=0.002..0.003 rows=18 loops=1)"
"                ->  Hash  (cost=1.29..1.29 rows=29 width=36) (actual time=0.027..0.027 rows=29 loops=1)"
"                      Buckets: 1024  Batches: 1  Memory Usage: 10kB"
"                      ->  Seq Scan on units u  (cost=0.00..1.29 rows=29 width=36) (actual time=0.007..0.016 rows=29 loops=1)"
"        ->  CTE Scan on s  (cost=0.00..13.88 rows=694 width=450) (actual time=0.228..3.590 rows=694 loops=1)"
"  ->  Hash  (cost=0.02..0.02 rows=1 width=20) (actual time=0.202..0.202 rows=3 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"        ->  CTE Scan on d  (cost=0.00..0.02 rows=1 width=20) (actual time=0.175..0.199 rows=3 loops=1)"
"Planning time: 1.676 ms"
"Execution time: 7.734 ms"

有人可以解释为什么我的功能挂起了这个过程吗? 功能定义如下:

CREATE OR REPLACE FUNCTION scientific.selectChannels(
    _siteid TEXT[] = '{%}',
    _measureid TEXT[] = '{}',
    _measurelabel TEXT[] = '{%}',
    _jokerid TEXT[] = NULL,
    _networkid TEXT[] = NULL,
    _sitetype TEXT[] = NULL,
    _tags TEXT[] = NULL,
    _inactive BOOLEAN = FALSE
)
RETURNS INTEGER[] AS
$selectChannels$

SELECT
array(
SELECT
    CM.ChannelId
FROM
    scientific.ChannelMetadata AS CM
WHERE
    (CM.SiteId ILIKE ANY(_siteid))
    AND ((CM.MeasureId ILIKE ANY(_measureid)) OR (CM.MeasureLabel LIKE ANY(_measurelabel)))
    AND ((CM.JokerId ILIKE ANY(_jokerid)) OR (_jokerid IS NULL))
    AND (CM.NetworkId ILIKE ANY(_networkid) OR (_networkid IS NULL))
    AND ((CM.SiteType ILIKE ANY(_sitetype)) OR (_sitetype IS NULL) OR (CM.SiteType IS NULL))
    AND ((array_lowercase(_tags) && array_lowercase(CM.Tags)) OR (_tags IS NULL) OR (CM.Tags IS NULL))
    AND (_inactive OR CM.ActiveFlag)
ORDER BY
    -- Do not alter this ORDER BY clause (Python API relies on this invariant: channels are ordered by id):
    CM.ChannelId
);

$selectChannels$
LANGUAGE SQL;

其中channelMetada是合并5个表的VIEW

我可以在第一次运行中看到rows removed(我不知道如何解释这个消息),但它可能是答案的一部分。 我认为唯一的事情是,我的函数被调用了几次而不是被计算一次,成为我查询的瓶颈。 我不知道如何解决这个问题。

问题是:

  • 我的函数不止一次被调用吗?
  • 我必须更改功能签名吗?
  • 这个查询我做错了什么?

更新

如果我提取数组并在查询中重新创建它,那么从远处开始表现会更好。 为什么我的功能签名会减慢流程?

[...]
channelId = ANY(array(SELECT scientific.selectChannels(_measurelabel := '{O3,SO2,NO2}', _siteid:='{41R001}', _networkid := '{RTU}')))
[...]

1 个答案:

答案 0 :(得分:4)

  

我的函数被多次调用了吗?

是的。

为CTE返回的108行中的每一行调用它。实际上你很幸运,TimeValue列上的条件可以通过索引来解决,否则你的函数将被调用一次表inline中的6240行中的每一行,而不仅仅是108由where条件的其他部分选择。

由于您的函数不依赖于从表中检索到的任何值,因此您可以通过将结果放入另一个CTE来改进此功能:

WITH func_result (channels) AS (
   select scientific.selectChannels(_measurelabel := '{O3,SO2,NO2}', _siteid:='{41R001}', _networkid := '{RTU}')
), D AS (
  SELECT DISTINCT ON (ChannelId) ChannelId ,TimeValue, FloatValue
  FROM datastore.inline
  WHERE 
    channelId = ANY( (select channels from func_result)::int[] ) 
    AND QualityCodeId = 6
    AND GranulityId = 1
    AND TimeValue >= date_trunc('day', now()::TIMESTAMP) AND TimeValue < date_trunc('day', now()::TIMESTAMP) + '1 day'::INTERVAL
  ORDER BY ChannelId, TimeValue DESC
)
SELECT
    D.*
   ,CM.ChannelUnitsId
   ,CM.MeasureLabel
   ,CM.SiteId
   ,CM.SiteLabel
FROM
    D JOIN scientific.channelMetadata AS CM ON (D.ChannelId = CM.ChannelId)

而不是

FROM datastore.inline
WHERE channelId = ANY( (select channels from func_result)::int[] ) 

你也可以使用:

FROM datastore.inline
   JOIN func_result f on channelid = any(f.channels)
WHERE QualityCodeId = 6
  AND GranulityId = 1
  AND TimeValue ...

这应该导致函数只被调用一次。

从理论上讲,你可以通过将函数定义为immutable来欺骗Postgres只调用一次函数。但这将是一个公然的谎言(并且是一个非常肮脏的技巧),因为该函数确实执行数据库查找,并且执行该操作的函数永远不应该被定义为immutable