Question

我有一张这样的表：

   Column    |            Type             |                       Modifiers                       
-------------+-----------------------------+-------------------------------------------------------
 id          | integer                     | not null default nextval('oks_id_seq'::regclass)
 uname       | text                        | not null
 ess         | text                        | 
 quest       | text                        | 
 details     | text                        | 
 status      | character(1)                | not null default 'q'::bpchar
 last_parsed | timestamp without time zone | 
 qstatus     | character(1)                | not null default 'q'::bpchar
 media_wc    | integer                     | not null default 0
Indexes:
    "oks_pkey" PRIMARY KEY, btree (id)
    "oks_uname_key" UNIQUE CONSTRAINT, btree (uname)
    "last_parsed_idx" btree (last_parsed)
    "qstatus_idx" btree (qstatus)
    "status_idx" btree (status)

我有这样的查询：

SELECT COUNT(status), status FROM oks GROUP BY status ORDER BY status;

结果是：

  count  | status 
---------+--------
 1478472 | d
   23599 | p
   10178 | q
 6278206 | s
(4 rows)

哪个好，但这需要永远，并且出于某种原因Postgres将整个索引保留在磁盘上，因为在查询期间磁盘活动非常高。

Sort  (cost=1117385.91..1117385.92 rows=4 width=2) (actual time=54122.991..54122.993 rows=4 loops=1)
   Sort Key: status
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=1117385.82..1117385.86 rows=4 width=2) (actual time=54122.280..54122.283 rows=4 loops=1)
         ->  Seq Scan on oks  (cost=0.00..1078433.55 rows=7790455 width=2) (actual time=0.009..47978.616 rows=7790455 loops=1)
 Total runtime: 54123.487 ms
(6 rows)

在我的配置中，我将内存使用率设置为 work_mem = 128MB

关于如何在整个表格中优化使用group by的查询的任何想法？这似乎是不切实际的慢，因为平面文件存储会更快。

编辑：通过修改postgres配置文件，我能够在几分之一秒内运行查询。具体来说，设置

fsync = off
synchronous_commit = off
full_page_writes = off
commit_delay = 2000
effective_cache_size = 4GB
work_mem = 512MB
maintenance_work_mem = 512MB

不确定这些是否是最佳的，但这些选项适用于我的情况。我认为fsync = off帮助最多。

Answer 1

尝试使用cstore。它是专栏店＆＃34; table＆＃34;。

信息： https://github.com/citusdata/cstore_fdw

如何使用cstore： https://stackoverflow.com/questions/29970937/psql-using-cstore-table-for-aggregation-big-data

按计数查询分组需要很长时间

1 个答案: