有什么方法可以提高动态分组查询的速度吗?我有一个3000万行的表。
create table if not exists tb
(
id serial not null constraint tb_pkey primary key,
week integer,
month integer,
year integer,
starttime varchar(20),
endtime varchar(20),
brand smallint,
category smallint,
value real
);
下面的查询需要8.5秒。
SELECT category from tb group by category
有什么方法可以提高速度。我尝试过有无索引。
答案 0 :(得分:1)
对于那个确切的查询,不是真的;进行此操作需要扫描每一行。没办法。
但是,如果您希望能够快速获取一组唯一的类别,并且在该列上有一个索引,则可以使用问题编辑中显示的WITH RECURSIVE
示例的变体这里(朝问题的结尾看):
Counting distinct rows using recursive cte over non-distinct index
您需要对其进行更改,以返回唯一值集而不是对它们进行计数,但这看起来像一个简单的更改:
testdb=# create table tb(id bigserial, category smallint);
CREATE TABLE
testdb=# insert into tb(category) select 2 from generate_series(1, 10000)
testdb-# ;
INSERT 0 10000
testdb=# insert into tb(category) select 1 from generate_series(1, 10000);
INSERT 0 10000
testdb=# insert into tb(category) select 3 from generate_series(1, 10000);
INSERT 0 10000
testdb=# create index on tb(category);
CREATE INDEX
testdb=# WITH RECURSIVE cte AS
(
(SELECT category
FROM tb
WHERE category >= 0
ORDER BY 1
LIMIT 1)
UNION ALL SELECT
(SELECT category
FROM tb
WHERE category > c.category
ORDER BY 1
LIMIT 1)
FROM cte c
WHERE category IS NOT NULL)
SELECT category
FROM cte
WHERE category IS NOT NULL;
category
----------
1
2
3
(3 rows)
这是EXPLAIN ANALYZE
:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on cte (cost=40.66..42.68 rows=100 width=2) (actual time=0.057..0.127 rows=3 loops=1)
Filter: (category IS NOT NULL)
Rows Removed by Filter: 1
CTE cte
-> Recursive Union (cost=0.29..40.66 rows=101 width=2) (actual time=0.052..0.119 rows=4 loops=1)
-> Limit (cost=0.29..0.33 rows=1 width=2) (actual time=0.051..0.051 rows=1 loops=1)
-> Index Only Scan using tb_category_idx on tb tb_1 (cost=0.29..1363.29 rows=30000 width=2) (actual time=0.050..0.050 rows=1 loops=1)
Index Cond: (category >= 0)
Heap Fetches: 1
-> WorkTable Scan on cte c (cost=0.00..3.83 rows=10 width=2) (actual time=0.015..0.015 rows=1 loops=4)
Filter: (category IS NOT NULL)
Rows Removed by Filter: 0
SubPlan 1
-> Limit (cost=0.29..0.36 rows=1 width=2) (actual time=0.016..0.016 rows=1 loops=3)
-> Index Only Scan using tb_category_idx on tb (cost=0.29..755.95 rows=10000 width=2) (actual time=0.015..0.015 rows=1 loops=3)
Index Cond: (category > c.category)
Heap Fetches: 2
Planning time: 0.224 ms
Execution time: 0.191 ms
(19 rows)
必须执行WorkTable
扫描节点的循环数等于您拥有的唯一类别的数量加一,因此它应保持非常快的速度,例如,多达数百个唯一值。 / p>
您可以采取的另一种方法是添加另一个表,该表仅存储tb.category
的唯一值,并由应用程序逻辑检查该表并在更新/插入该列时插入其值。这也可以通过触发器在数据库端完成。链接问题的答案中也讨论了该解决方案。