PostgreSQL增加了三千万行

时间:2018-11-10 19:06:04

标签: postgresql performance rdbms database-performance

有什么方法可以提高动态分组查询的速度吗?我有一个3000万行的表。

create table if not exists tb
(
    id serial not null constraint tb_pkey primary key,
    week integer,
    month integer,
    year integer,
    starttime varchar(20),
    endtime varchar(20),
    brand smallint,
    category smallint,
    value real
);

下面的查询需要8.5秒。

SELECT category from tb group by category

有什么方法可以提高速度。我尝试过有无索引。

1 个答案:

答案 0 :(得分:1)

对于那个确切的查询,不是真的;进行此操作需要扫描每一行。没办法。

但是,如果您希望能够快速获取一组唯一的类别,并且在该列上有一个索引,则可以使用问题编辑中显示的WITH RECURSIVE示例的变体这里(朝问题的结尾看):

Counting distinct rows using recursive cte over non-distinct index

您需要对其进行更改,以返回唯一值集而不是对它们进行计数,但这看起来像一个简单的更改:

testdb=# create table tb(id bigserial, category smallint);
CREATE TABLE
testdb=# insert into tb(category) select 2 from generate_series(1, 10000)
testdb-# ;
INSERT 0 10000
testdb=# insert into tb(category) select 1 from generate_series(1, 10000);
INSERT 0 10000
testdb=# insert into tb(category) select 3 from generate_series(1, 10000);
INSERT 0 10000
testdb=# create index on tb(category);
CREATE INDEX
testdb=# WITH RECURSIVE cte AS
  (
     (SELECT category
      FROM tb
      WHERE category >= 0
      ORDER BY 1
      LIMIT 1)
   UNION ALL SELECT
     (SELECT category
      FROM tb
      WHERE category > c.category
      ORDER BY 1
      LIMIT 1)
   FROM cte c
   WHERE category IS NOT NULL)
SELECT category
FROM cte
WHERE category IS NOT NULL;
 category 
----------
        1
        2
        3
(3 rows)

这是EXPLAIN ANALYZE

    QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on cte  (cost=40.66..42.68 rows=100 width=2) (actual time=0.057..0.127 rows=3 loops=1)
   Filter: (category IS NOT NULL)
   Rows Removed by Filter: 1
   CTE cte
     ->  Recursive Union  (cost=0.29..40.66 rows=101 width=2) (actual time=0.052..0.119 rows=4 loops=1)
           ->  Limit  (cost=0.29..0.33 rows=1 width=2) (actual time=0.051..0.051 rows=1 loops=1)
                 ->  Index Only Scan using tb_category_idx on tb tb_1  (cost=0.29..1363.29 rows=30000 width=2) (actual time=0.050..0.050 rows=1 loops=1)
                       Index Cond: (category >= 0)
                       Heap Fetches: 1
           ->  WorkTable Scan on cte c  (cost=0.00..3.83 rows=10 width=2) (actual time=0.015..0.015 rows=1 loops=4)
                 Filter: (category IS NOT NULL)
                 Rows Removed by Filter: 0
                 SubPlan 1
                   ->  Limit  (cost=0.29..0.36 rows=1 width=2) (actual time=0.016..0.016 rows=1 loops=3)
                         ->  Index Only Scan using tb_category_idx on tb  (cost=0.29..755.95 rows=10000 width=2) (actual time=0.015..0.015 rows=1 loops=3)
                               Index Cond: (category > c.category)
                               Heap Fetches: 2
 Planning time: 0.224 ms
 Execution time: 0.191 ms
(19 rows)

必须执行WorkTable扫描节点的循环数等于您拥有的唯一类别的数量加一,因此它应保持非常快的速度,例如,多达数百个唯一值。 / p>

您可以采取的另一种方法是添加另一个表,该表仅存储tb.category的唯一值,并由应用程序逻辑检查该表并在更新/插入该列时插入其值。这也可以通过触发器在数据库端完成。链接问题的答案中也讨论了该解决方案。