Postgres查询规划器是否未正确使用多列索引?

时间:2017-10-05 13:14:53

标签: postgresql indexing query-planner

我正在针对中型数据库运行复杂查询(访问的纵向分析)(查询中的最大表有9100万条记录)。

我的问题是,在此查询中,单个表连接(data1,在下面的代码中注释)占用了查询成本的50%以上。它在查询计划中显示为“索引扫描”,但这是欺骗性的,因为它只使用两列索引的第二列 - PK是(programinstanceid,dataelementid),dataelementid没有自己的索引 - 这实际上意味着Postgres实际上正在对索引进行“全表扫描”(即必须扫描索引第二列中的每一行)。

奇怪的是,我在联接中包含了两个列,以利用这个双列索引,但Postgres忽略了这一点,并使用单列单独处理第一列(并且非常快)索引查找...通过这种严重缓慢的扫描使第二列自行处理。 (我在另一个点有一个相同的连接,data2,Postgres使用双列索引正确处理 - 在下面的代码中也有注释。)

任何人都可以看到Postgres查询规划器使用data1连接执行此操作的任何合理原因,或者这实际上是查询规划器中的“错误”吗?

更重要的是,有什么方法可以提示Postgres更改其策略,并使用两列索引的两列进行此连接? (这是一个GA开源应用程序,因此我无法更改索引。)

查询(更改了特定参数以保护无辜者)和查询计划都在下面:

QUERY

SELECT
  abc1.value ABC,
  orgunit1.shortname InitialServiceOrgUnit,
  data1.value InitialService,
  event1.executiondate InitialServiceDate,
  substring(
    MIN(
      event2.executiondate
      ||
      data2.value
    )
  ,11) FollowUpService,
  to_date(
    substring(
      MIN(
        event2.executiondate
        ||
        data2.value
      )
    ,1,10)
  , 'YYYY-MM-DD') FollowUpServiceDate,
  event1.executiondate
    -
    to_date(
      substring(
        MIN(
          event2.executiondate
          ||
          data2.value
        )
      ,1,10)
    , 'YYYY-MM-DD')
  as FollowUpDays
FROM
  programstageinstance event1
  -- THIS IS THE OFFENDING JOIN, WHICH INCLUDES BOTH COLUMNS IN THE
  -- INDEX, BUT THE QUERY PLANNER PROCESSES EACH COLUMN SEPARATELY
  INNER JOIN trackedentitydatavalue data1
    ON (
      event1.programstageinstanceid = data1.programstageinstanceid
      AND data1.dataelementid IN (111111)
    )
  INNER JOIN trackedentitydatavalue eventdate1
    ON (
      event1.programstageinstanceid = eventdate1.programstageinstanceid
      AND eventdate1.dataelementid = 222222
    )
  INNER JOIN organisationunit orgunit1 ON event1.organisationunitid = orgunit1.organisationunitid
  INNER JOIN programinstance enrol1 ON enrol1.programinstanceid = event1.programinstanceid
  INNER JOIN psi_view_trackedentityattributevalue abc1
    ON abc1.trackedentityinstanceid = enrol1.trackedentityinstanceid
       AND abc1.trackedentityattributeid = 333333
  LEFT JOIN psi_view_trackedentityattributevalue abc2
    ON abc1.value = abc2.value
       AND abc2.trackedentityattributeid = 333333
  LEFT JOIN programinstance enrol2 ON abc2.trackedentityinstanceid = enrol2.trackedentityinstanceid
  LEFT JOIN programstageinstance event2
    ON enrol2.programinstanceid = event2.programinstanceid
    AND event2.programstageid = 444444
    AND event1.programstageinstanceid <> event2.programstageinstanceid
    AND event1.organisationunitid = event2.organisationunitid
  LEFT JOIN trackedentitydatavalue eventdate2
    ON (
      event2.programstageinstanceid = eventdate2.programstageinstanceid
      AND eventdate2.dataelementid = 222222
    )
  -- THIS IS A SIMILAR JOIN, WHICH IS CORRECTLY PROCESSED BY THE
  -- QUERY PLANNER AT MUCH LOWER COST USING THE MULTI-COLUMN INDEX
  LEFT JOIN trackedentitydatavalue data2
    ON (
    event2.programstageinstanceid = data2.programstageinstanceid
    AND data2.dataelementid IN (555555, 666666)
    )
    AND (data2.value = 'xxx' OR data2.value = 'yyy')
WHERE
  event1.programstageid = 444444
  AND
  (
    event2.executiondate > event1.executiondate
    OR
    event2.executiondate is NULL
  )
GROUP BY
ABC, InitialServiceOrgUnit, InitialService, InitialServiceDate

查询计划

Link to image of query plan (wouldn't indent properly in StackOverflow!

enter image description here

0 个答案:

没有答案