我的查询返回数据不为空的每个字段的音量。
SELECT COUNT(field1) AS field1, COUNT(field2) AS field2, COUNT(field3) AS field3
FROM (
SELECT field1, field2, field3
FROM table1, table2
WHERE table1.id=table2.idt1
ORDER BY table1.id ASC
LIMIT 10000
) AS rq
table1.id是table1的主键,table2.idt1是table2的辅助键。 这个查询工作得非常好,但如果我需要返回每个字段的DISTINCT卷,就像这样
SELECT COUNT(DISTINCT(field1)) AS field1, COUNT(DISTINCT(field2)) AS field2, COUNT(DISTINCT(field3)) AS field3
FROM (
SELECT field1, field2, field3
FROM table1, table2
WHERE table1.id=table2.idt1
ORDER BY table1.id ASC
LIMIT 10000
) AS rq
问题开始......查询正在进行并且正在完成工作,但性能当然比没有DISTINCT子句要慢得多。
table1和table2的每个字段都是btree
的索引CREATE INDEX field1_index ON table1 USING btree (field1)
CREATE INDEX field2_index ON table1 USING btree (field2)
CREATE INDEX field3_index ON table2 USING btree (field3)
如何加快此DISTINCT计数?也许有更好的索引?
感谢您的帮助
答案 0 :(得分:0)
Postgres没有很好地优化COUNT(DISTINCT)
。你有多个这样的表达式,这使得它更难。我将建议使用窗口函数和条件聚合:
SELECT SUM(CASE WHEN seqnum_1 = 1 THEN 1 ELSE 0 END) as field1,
SUM(CASE WHEN seqnum_2 = 1 THEN 1 ELSE 0 END) as field2,
SUM(CASE WHEN seqnum_3 = 1 THEN 1 ELSE 0 END) as field3
FROM (SELECT field1, field2, field3,
ROW_NUMBER() OVER (PARTITION BY field1 ORDER BY field1) as seqnum_1,
ROW_NUMBER() OVER (PARTITION BY field2 ORDER BY field2) as seqnum_2,
ROW_NUMBER() OVER (PARTITION BY field3 ORDER BY field3) as seqnum_3
FROM table1 JOIN
table2
ON table1.id=table2.idt1
ORDER BY table1.id ASC
LIMIT 10000
) rq
编辑:
我发现row_number()
可能会在<{em> limit
之前处理。试试这个版本:
SELECT SUM(CASE WHEN seqnum_1 = 1 THEN 1 ELSE 0 END) as field1,
SUM(CASE WHEN seqnum_2 = 1 THEN 1 ELSE 0 END) as field2,
SUM(CASE WHEN seqnum_3 = 1 THEN 1 ELSE 0 END) as field3
FROM (SELECT field1, field2, field3,
ROW_NUMBER() OVER (PARTITION BY field1 ORDER BY field1) as seqnum_1,
ROW_NUMBER() OVER (PARTITION BY field2 ORDER BY field2) as seqnum_2,
ROW_NUMBER() OVER (PARTITION BY field3 ORDER BY field3) as seqnum_3
FROM (SELECT field1, field2, field3
FROM table1 JOIN
table2
ON table1.id = table2.idt1
ORDER BY table1.id ASC
LIMIT 10000
) t
) rq
答案 1 :(得分:0)
我在大桌子上尝试过类似的东西。 (12百万行)
没有DISTINCT
需要10秒钟。
使用DISTINCT
代码,需要19秒。
在子查询中设置DISTINCT
需要11秒
SELECT COUNT(field1) AS field1, COUNT(field2) AS field2, COUNT(field3) AS field3
FROM (
SELECT DISTINCT(field1) AS field1, DISTINCT(field2) AS field2, DISTINCT(field3) AS field3
FROM table1, table2
WHERE table1.id=table2.idt1
ORDER BY table1.id ASC
LIMIT 10000
) AS rq
另外,如果您只想过滤NULL
数据,可以在where子句中使用,而不是使用distinct。