Question

我正在尝试优化此查询：

EXPLAIN ANALYZE  
select
  dtt.matching_protein_seq_ids
from detected_transcript_translation dtt
join peptide_spectrum_match psm 
    on psm.detected_transcript_translation_id = 
       dtt.detected_transcript_translation_id
join peptide_spectrum_match_sequence psms 
    on  psm.peptide_spectrum_match_sequence_id = 
       psms.peptide_spectrum_match_sequence_id
WHERE
dtt.matching_protein_seq_ids && ARRAY[654819, 294711]
;

当允许seq_scan时（设置enable_seqscan = on），优化器会选择一个在49.85秒内运行的非常糟糕的计划：

https://explain.depesz.com/s/WKbew

使用set enable_seqscan = off，选择的计划使用正确的索引，并且查询立即运行。

https://explain.depesz.com/s/ISHV

请注意，我确实在所有三个表上运行了一个ANALYZE ......

Answer 1

你的问题是PostgreSQL无法很好地估计WHERE条件，所以它估计它是估计总行数的一定百分比，这太过分了。

如果你知道这样的查询总会有很少的结果行，你可以通过定义一个函数作弊

CREATE OR REPLACE FUNCTION matching_transcript_translations(integer[])
   RETURNS SETOF detected_transcript_translation
   LANGUAGE SQL
   STABLE STRICT
   ROWS 2  /* pretend there are always exactly two matching rows */
AS
'SELECT * FROM detected_transcript_translation
   WHERE matching_protein_seq_ids && $1';

您可以使用

select
  dtt.matching_protein_seq_ids
from matching_transcript_translations(ARRAY[654819, 294711]) dtt
join peptide_spectrum_match psm 
    on psm.detected_transcript_translation_id = 
       dtt.detected_transcript_translation_id
join peptide_spectrum_match_sequence psms 
    on  psm.peptide_spectrum_match_sequence_id = 
       psms.peptide_spectrum_match_sequence_id;

然后应该假装PostgreSQL认为只有一个匹配的行。

但是，如果有很多匹配的行，那么最终的计划将比您当前的计划更糟糕......

postgres选择一个令人满意的查询计划，如何解决

1 个答案: