加快计算速度

时间:2016-07-05 11:57:42

标签: sql postgresql

我的查询返回数据不为空的每个字段的音量。

SELECT COUNT(field1) AS field1, COUNT(field2) AS field2, COUNT(field3) AS field3
FROM (
    SELECT field1, field2, field3
    FROM table1, table2
    WHERE table1.id=table2.idt1 
    ORDER BY table1.id ASC
    LIMIT 10000
) AS rq

table1.id是table1的主键,table2.idt1是table2的辅助键。 这个查询工作得非常好,但如果我需要返回每个字段的DISTINCT卷,就像这样

SELECT COUNT(DISTINCT(field1)) AS field1, COUNT(DISTINCT(field2)) AS field2, COUNT(DISTINCT(field3)) AS field3
FROM (
    SELECT field1, field2, field3
    FROM table1, table2
    WHERE table1.id=table2.idt1 
    ORDER BY table1.id ASC
    LIMIT 10000
) AS rq

问题开始......查询正在进行并且正在完成工作,但性能当然比没有DISTINCT子句要慢得多。

table1和table2的每个字段都是btree

的索引
CREATE INDEX field1_index ON table1 USING btree (field1)
CREATE INDEX field2_index ON table1 USING btree (field2)
CREATE INDEX field3_index ON table2 USING btree (field3)

如何加快此DISTINCT计数?也许有更好的索引?

感谢您的帮助

2 个答案:

答案 0 :(得分:0)

Postgres没有很好地优化COUNT(DISTINCT)。你有多个这样的表达式,这使得它更难。我将建议使用窗口函数和条件聚合:

SELECT SUM(CASE WHEN seqnum_1 = 1 THEN 1 ELSE 0 END) as field1, 
       SUM(CASE WHEN seqnum_2 = 1 THEN 1 ELSE 0 END) as field2, 
       SUM(CASE WHEN seqnum_3 = 1 THEN 1 ELSE 0 END) as field3 
FROM (SELECT field1, field2, field3,
             ROW_NUMBER() OVER (PARTITION BY field1 ORDER BY field1) as seqnum_1,
             ROW_NUMBER() OVER (PARTITION BY field2 ORDER BY field2) as seqnum_2,
             ROW_NUMBER() OVER (PARTITION BY field3 ORDER BY field3) as seqnum_3
      FROM table1 JOIN
           table2
           ON table1.id=table2.idt1 
      ORDER BY table1.id ASC
      LIMIT 10000
     ) rq

编辑:

我发现row_number()可能会在<{em> limit之前处理。试试这个版本:

SELECT SUM(CASE WHEN seqnum_1 = 1 THEN 1 ELSE 0 END) as field1, 
       SUM(CASE WHEN seqnum_2 = 1 THEN 1 ELSE 0 END) as field2, 
       SUM(CASE WHEN seqnum_3 = 1 THEN 1 ELSE 0 END) as field3 
FROM (SELECT field1, field2, field3,
             ROW_NUMBER() OVER (PARTITION BY field1 ORDER BY field1) as seqnum_1,
             ROW_NUMBER() OVER (PARTITION BY field2 ORDER BY field2) as seqnum_2,
             ROW_NUMBER() OVER (PARTITION BY field3 ORDER BY field3) as seqnum_3
      FROM (SELECT field1, field2, field3
            FROM table1 JOIN
                 table2
                 ON table1.id = table2.idt1 
            ORDER BY table1.id ASC
            LIMIT 10000
           ) t
     ) rq

答案 1 :(得分:0)

我在大桌子上尝试过类似的东西。 (12百万行)

没有DISTINCT需要10秒钟。

使用DISTINCT代码,需要19秒。

在子查询中设置DISTINCT需要11秒

SELECT COUNT(field1) AS field1, COUNT(field2) AS field2, COUNT(field3) AS field3
FROM (
    SELECT DISTINCT(field1) AS field1, DISTINCT(field2) AS field2, DISTINCT(field3) AS field3
    FROM table1, table2
    WHERE table1.id=table2.idt1 
    ORDER BY table1.id ASC
    LIMIT 10000
) AS rq

另外,如果您只想过滤NULL数据,可以在where子句中使用,而不是使用distinct。