我有一张这样的表:
Column | Type | Modifiers
-------------+-----------------------------+-------------------------------------------------------
id | integer | not null default nextval('oks_id_seq'::regclass)
uname | text | not null
ess | text |
quest | text |
details | text |
status | character(1) | not null default 'q'::bpchar
last_parsed | timestamp without time zone |
qstatus | character(1) | not null default 'q'::bpchar
media_wc | integer | not null default 0
Indexes:
"oks_pkey" PRIMARY KEY, btree (id)
"oks_uname_key" UNIQUE CONSTRAINT, btree (uname)
"last_parsed_idx" btree (last_parsed)
"qstatus_idx" btree (qstatus)
"status_idx" btree (status)
我有这样的查询:
SELECT COUNT(status), status FROM oks GROUP BY status ORDER BY status;
结果是:
count | status
---------+--------
1478472 | d
23599 | p
10178 | q
6278206 | s
(4 rows)
哪个好,但这需要永远,并且出于某种原因Postgres将整个索引保留在磁盘上,因为在查询期间磁盘活动非常高。
Sort (cost=1117385.91..1117385.92 rows=4 width=2) (actual time=54122.991..54122.993 rows=4 loops=1)
Sort Key: status
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=1117385.82..1117385.86 rows=4 width=2) (actual time=54122.280..54122.283 rows=4 loops=1)
-> Seq Scan on oks (cost=0.00..1078433.55 rows=7790455 width=2) (actual time=0.009..47978.616 rows=7790455 loops=1)
Total runtime: 54123.487 ms
(6 rows)
在我的配置中,我将内存使用率设置为 work_mem = 128MB
关于如何在整个表格中优化使用group by的查询的任何想法?这似乎是不切实际的慢,因为平面文件存储会更快。
编辑: 通过修改postgres配置文件,我能够在几分之一秒内运行查询。具体来说,设置
fsync = off
synchronous_commit = off
full_page_writes = off
commit_delay = 2000
effective_cache_size = 4GB
work_mem = 512MB
maintenance_work_mem = 512MB
不确定这些是否是最佳的,但这些选项适用于我的情况。 我认为fsync = off帮助最多。
答案 0 :(得分:1)
尝试使用cstore。它是专栏店" table"。
信息: https://github.com/citusdata/cstore_fdw
如何使用cstore: https://stackoverflow.com/questions/29970937/psql-using-cstore-table-for-aggregation-big-data