为什么sqlite用不重复的方式重新扫描表格?

时间:2018-09-26 21:55:18

标签: sqlite distinct

我需要获取ref和alt的不同元素。我有一个非常有效的查询,直到我添加了distinct并且它重新扫描了基表?由于我有一个临时表,难道它不应该简单地将其用作数据源吗?

   sqlite> explain query plan
       ...> select t1.ref, t1.alt from (SELECT * from Sample_szes where str_id 
        = 'STR_832206') as t1;


 selectid|order|from|detail
    1|0|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx ( . 
     str_id=?) (~10 rows)
     1|1|1|SEARCH TABLE vcfhomozyg AS hzyg USING INDEX homozyg_strid_idx 
     (str_id=?) (~10 rows)
      2|0|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx 
      (str_id=?) (~10 rows)
     2|1|1|SEARCH TABLE vcfAlt AS alt USING INDEX vcfAlt_strid_idx 
    (str_id=?) (~2 rows)
    2|2|2|SEARCH TABLE altGT AS gt USING INDEX altGT_strid_idx (str_id=?) (~2 rows)
    0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

添加distinct,它会重新扫描大型基表。

sqlite> explain query plan
 ...> select distinct t1.ref, t1.alt from (SELECT * from Sample_szes 
 where str_id = 'STR_832206') as t1;

selectid|order|from|detail
2|0|0|SCAN TABLE vcfBase AS base (~1000000 rows)
2|1|1|SEARCH TABLE vcfhomozyg AS hzyg USING INDEX homozyg_strid_idx 
(str_id=?) (~10 rows)
3|0|0|SCAN TABLE vcfBase AS base (~1000000 rows)
3|1|1|SEARCH TABLE vcfAlt AS alt USING INDEX vcfAlt_strid_idx (str_id=?) (~2 rows)
3|2|2|SEARCH TABLE altGT AS gt USING INDEX altGT_strid_idx (str_id=?) (~2 rows)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SCAN SUBQUERY 1 (~1400000 rows)
0|0|0|USE TEMP B-TREE FOR DISTINCT

2 个答案:

答案 0 :(得分:1)

您应为 ref alt 列创建一个复合索引。然后将使用索引。否则,将创建临时的B-TREE(索引),这需要整个扫描才能对索引的数据进行排序。

我相信解释是根据:-

  

如果SELECT查询包含ORDER BY,GROUP BY或DISTINCT子句,   SQLite可能需要使用临时的b树结构对输出进行排序   行。或者,它可能使用索引。几乎总是使用索引   比执行排序更有效。

     

如果需要临时b树,则将一条记录添加到EXPLAIN   QUERY PLAN输出的“详细信息”字段设置为   形式为“使用TEMP B-TREE FOR xxx”,其中xxx是“ ORDER BY”,“ GROUP”之一   BY”或“ DISTINCT”。例如:

sqlite> EXPLAIN QUERY PLAN SELECT c, d FROM t2 ORDER BY c;
QUERY PLAN
|--SCAN TABLE t2
`--USE TEMP B-TREE FOR ORDER BY
     

在这种情况下,可以通过创建一个   t2(c)上的索引,如下所示:

sqlite> CREATE INDEX i4 ON t2(c);
sqlite> EXPLAIN QUERY PLAN SELECT c, d FROM t2 ORDER BY c; 
QUERY PLAN
`--SCAN TABLE t2 USING INDEX i4
     

EXPLAIN QUERY PLAN - 1.2. Temporary Sorting B-Trees

答案 1 :(得分:0)

我想我可能已经找到了答案。在我的Mac上,我具有以下版本的sqlite

SQLite版本3.19.3 2017-06-27 16:48:08

    sqlite> explain query plan
   ...> select distinct t1.ref, t1.alt from (SELECT * from Sample_szes where str_id = 'STR_832206') as t1;
2|0|1|SEARCH TABLE vcfhomozyg AS hzyg USING INDEX homozyg_strid_idx (str_id=?)
2|1|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx (str_id=?)
3|0|1|SEARCH TABLE vcfAlt AS alt USING INDEX vcfAlt_strid_idx (str_id=?)
3|1|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx (str_id=?)
3|2|2|SEARCH TABLE altGT AS gt USING INDEX altGT_strid_idx (str_id=?)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR DISTINCT