我有一个非常简单的SQL查询:
SELECT COUNT(DISTINCT x) FROM table;
我的桌子有大约150万行。这个查询运行得很慢;与
相比,它需要大约7.5秒 SELECT COUNT(x) FROM table;
大约需要435毫秒。有没有办法更改我的查询以提高性能?我已经尝试过分组并进行常规计数,以及在x上放置索引;两者都有相同的7.5s执行时间。
答案 0 :(得分:248)
您可以使用:
SELECT COUNT(*) FROM (SELECT DISTINCT column_name FROM table_name) AS temp;
这比以下快得多:
COUNT(DISTINCT column_name)
答案 1 :(得分:10)
-- My default settings (this is basically a single-session machine, so work_mem is pretty high)
SET effective_cache_size='2048MB';
SET work_mem='16MB';
\echo original
EXPLAIN ANALYZE
SELECT
COUNT (distinct val) as aantal
FROM one
;
\echo group by+count(*)
EXPLAIN ANALYZE
SELECT
distinct val
-- , COUNT(*)
FROM one
GROUP BY val;
\echo with CTE
EXPLAIN ANALYZE
WITH agg AS (
SELECT distinct val
FROM one
GROUP BY val
)
SELECT COUNT (*) as aantal
FROM agg
;
结果:
original QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=36448.06..36448.07 rows=1 width=4) (actual time=1766.472..1766.472 rows=1 loops=1)
-> Seq Scan on one (cost=0.00..32698.45 rows=1499845 width=4) (actual time=31.371..185.914 rows=1499845 loops=1)
Total runtime: 1766.642 ms
(3 rows)
group by+count(*)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=36464.31..36477.31 rows=1300 width=4) (actual time=412.470..412.598 rows=1300 loops=1)
-> HashAggregate (cost=36448.06..36461.06 rows=1300 width=4) (actual time=412.066..412.203 rows=1300 loops=1)
-> Seq Scan on one (cost=0.00..32698.45 rows=1499845 width=4) (actual time=26.134..166.846 rows=1499845 loops=1)
Total runtime: 412.686 ms
(4 rows)
with CTE
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=36506.56..36506.57 rows=1 width=0) (actual time=408.239..408.239 rows=1 loops=1)
CTE agg
-> HashAggregate (cost=36464.31..36477.31 rows=1300 width=4) (actual time=407.704..407.847 rows=1300 loops=1)
-> HashAggregate (cost=36448.06..36461.06 rows=1300 width=4) (actual time=407.320..407.467 rows=1300 loops=1)
-> Seq Scan on one (cost=0.00..32698.45 rows=1499845 width=4) (actual time=24.321..165.256 rows=1499845 loops=1)
-> CTE Scan on agg (cost=0.00..26.00 rows=1300 width=0) (actual time=407.707..408.154 rows=1300 loops=1)
Total runtime: 408.300 ms
(7 rows)
与CTE相同的计划也可能由其他方法(窗口函数)生成
答案 2 :(得分:2)
如果您的count(distinct(x))
明显慢于count(x)
,那么您可以通过使用触发器维护不同表格中的x值计数来加快此查询速度,例如table_name_x_counts (x integer not null, x_count int not null)
。但是你的写入性能会受到影响,如果你在单个事务中更新多个x
值,那么你需要以一些明确的顺序执行此操作以避免可能的死锁。
答案 3 :(得分:0)
我也在搜索相同的答案,因为在某些时候我需要 total_count,其中包含不同的值以及限制/偏移。
因为它有点棘手 - 要获得具有不同值以及限制/偏移的总计数。通常很难获得带限制/偏移的总计数。最后我找到了办法 -
SELECT DISTINCT COUNT(*) OVER() as total_count, * FROM table_name limit 2 offset 0;
查询效果也很高。
答案 4 :(得分:-2)
从 tabela group by coluna 中选择 coluna, count(coluna) 作为 qtd