我正在针对中型数据库运行复杂查询(访问的纵向分析)(查询中的最大表有9100万条记录)。
我的问题是,在此查询中,单个表连接(data1,在下面的代码中注释)占用了查询成本的50%以上。它在查询计划中显示为“索引扫描”,但这是欺骗性的,因为它只使用两列索引的第二列 - PK是(programinstanceid,dataelementid),dataelementid没有自己的索引 - 这实际上意味着Postgres实际上正在对索引进行“全表扫描”(即必须扫描索引第二列中的每一行)。
奇怪的是,我在联接中包含了两个列,以利用这个双列索引,但Postgres忽略了这一点,并使用单列单独处理第一列(并且非常快)索引查找...通过这种严重缓慢的扫描使第二列自行处理。 (我在另一个点有一个相同的连接,data2,Postgres使用双列索引正确处理 - 在下面的代码中也有注释。)
任何人都可以看到Postgres查询规划器使用data1连接执行此操作的任何合理原因,或者这实际上是查询规划器中的“错误”吗?
更重要的是,有什么方法可以提示Postgres更改其策略,并使用两列索引的两列进行此连接? (这是一个GA开源应用程序,因此我无法更改索引。)
查询(更改了特定参数以保护无辜者)和查询计划都在下面:
QUERY
SELECT
abc1.value ABC,
orgunit1.shortname InitialServiceOrgUnit,
data1.value InitialService,
event1.executiondate InitialServiceDate,
substring(
MIN(
event2.executiondate
||
data2.value
)
,11) FollowUpService,
to_date(
substring(
MIN(
event2.executiondate
||
data2.value
)
,1,10)
, 'YYYY-MM-DD') FollowUpServiceDate,
event1.executiondate
-
to_date(
substring(
MIN(
event2.executiondate
||
data2.value
)
,1,10)
, 'YYYY-MM-DD')
as FollowUpDays
FROM
programstageinstance event1
-- THIS IS THE OFFENDING JOIN, WHICH INCLUDES BOTH COLUMNS IN THE
-- INDEX, BUT THE QUERY PLANNER PROCESSES EACH COLUMN SEPARATELY
INNER JOIN trackedentitydatavalue data1
ON (
event1.programstageinstanceid = data1.programstageinstanceid
AND data1.dataelementid IN (111111)
)
INNER JOIN trackedentitydatavalue eventdate1
ON (
event1.programstageinstanceid = eventdate1.programstageinstanceid
AND eventdate1.dataelementid = 222222
)
INNER JOIN organisationunit orgunit1 ON event1.organisationunitid = orgunit1.organisationunitid
INNER JOIN programinstance enrol1 ON enrol1.programinstanceid = event1.programinstanceid
INNER JOIN psi_view_trackedentityattributevalue abc1
ON abc1.trackedentityinstanceid = enrol1.trackedentityinstanceid
AND abc1.trackedentityattributeid = 333333
LEFT JOIN psi_view_trackedentityattributevalue abc2
ON abc1.value = abc2.value
AND abc2.trackedentityattributeid = 333333
LEFT JOIN programinstance enrol2 ON abc2.trackedentityinstanceid = enrol2.trackedentityinstanceid
LEFT JOIN programstageinstance event2
ON enrol2.programinstanceid = event2.programinstanceid
AND event2.programstageid = 444444
AND event1.programstageinstanceid <> event2.programstageinstanceid
AND event1.organisationunitid = event2.organisationunitid
LEFT JOIN trackedentitydatavalue eventdate2
ON (
event2.programstageinstanceid = eventdate2.programstageinstanceid
AND eventdate2.dataelementid = 222222
)
-- THIS IS A SIMILAR JOIN, WHICH IS CORRECTLY PROCESSED BY THE
-- QUERY PLANNER AT MUCH LOWER COST USING THE MULTI-COLUMN INDEX
LEFT JOIN trackedentitydatavalue data2
ON (
event2.programstageinstanceid = data2.programstageinstanceid
AND data2.dataelementid IN (555555, 666666)
)
AND (data2.value = 'xxx' OR data2.value = 'yyy')
WHERE
event1.programstageid = 444444
AND
(
event2.executiondate > event1.executiondate
OR
event2.executiondate is NULL
)
GROUP BY
ABC, InitialServiceOrgUnit, InitialService, InitialServiceDate
查询计划
Link to image of query plan (wouldn't indent properly in StackOverflow!