按计数查询分组需要很长时间

时间:2015-09-24 07:06:35

标签: postgresql postgresql-9.3

我有一张这样的表:

   Column    |            Type             |                       Modifiers                       
-------------+-----------------------------+-------------------------------------------------------
 id          | integer                     | not null default nextval('oks_id_seq'::regclass)
 uname       | text                        | not null
 ess         | text                        | 
 quest       | text                        | 
 details     | text                        | 
 status      | character(1)                | not null default 'q'::bpchar
 last_parsed | timestamp without time zone | 
 qstatus     | character(1)                | not null default 'q'::bpchar
 media_wc    | integer                     | not null default 0
Indexes:
    "oks_pkey" PRIMARY KEY, btree (id)
    "oks_uname_key" UNIQUE CONSTRAINT, btree (uname)
    "last_parsed_idx" btree (last_parsed)
    "qstatus_idx" btree (qstatus)
    "status_idx" btree (status)

我有这样的查询:

SELECT COUNT(status), status FROM oks GROUP BY status ORDER BY status;

结果是:

  count  | status 
---------+--------
 1478472 | d
   23599 | p
   10178 | q
 6278206 | s
(4 rows)

哪个好,但这需要永远,并且出于某种原因Postgres将整个索引保留在磁盘上,因为在查询期间磁盘活动非常高。

Sort  (cost=1117385.91..1117385.92 rows=4 width=2) (actual time=54122.991..54122.993 rows=4 loops=1)
   Sort Key: status
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=1117385.82..1117385.86 rows=4 width=2) (actual time=54122.280..54122.283 rows=4 loops=1)
         ->  Seq Scan on oks  (cost=0.00..1078433.55 rows=7790455 width=2) (actual time=0.009..47978.616 rows=7790455 loops=1)
 Total runtime: 54123.487 ms
(6 rows)

在我的配置中,我将内存使用率设置为 work_mem = 128MB

关于如何在整个表格中优化使用group by的查询的任何想法?这似乎是不切实际的慢,因为平面文件存储会更快。

编辑: 通过修改postgres配置文件,我能够在几分之一秒内运行查询。具体来说,设置

fsync = off
synchronous_commit = off
full_page_writes = off
commit_delay = 2000
effective_cache_size = 4GB
work_mem = 512MB
maintenance_work_mem = 512MB

不确定这些是否是最佳的,但这些选项适用于我的情况。 我认为fsync = off帮助最多。

1 个答案:

答案 0 :(得分:1)

尝试使用cstore。它是专栏店" table"。

信息: https://github.com/citusdata/cstore_fdw

如何使用cstore: https://stackoverflow.com/questions/29970937/psql-using-cstore-table-for-aggregation-big-data