加快数百万行的Postgres查询?

时间:2015-07-28 11:00:41

标签: performance postgresql

我正在使用Postgres 9.4。我有一张大桌子。这是我桌子的结构:

 processing_date | date                 |
 practice_id     | character varying(6) |
 chemical_id     | character varying(9) |
 items           | bigint               |
 cost            | double precision     |
Indexes:
    "vw_idx_chem_by_practice_chem_id" btree (chemical_id)
    "vw_idx_chem_by_practice_chem_id_vc" btree (chemical_id varchar_pattern_ops)
    "vw_idx_chem_by_practice_joint_id" btree (practice_id, chemical_id)

现在我想在表上运行LIKE查询。这是我的问题:

EXPLAIN (ANALYSE, BUFFERS) SELECT sum(pr.cost) as actual_cost, 
      sum(pr.items) as items, pr.practice_id as row_id, 
      pc.name as row_name, pr.processing_date as date 
FROM vw_chemical_summary_by_practice pr 
JOIN frontend_practice pc ON pr.practice_id=pc.code 
WHERE (pr.chemical_id LIKE '0401%' ) 
GROUP BY pr.practice_id, pc.code, date 
ORDER BY date, pr.practice_id;

这是EXPLAIN:http://explain.depesz.com/s/lYRT

的结果

正如您所看到的,它的速度很慢,部分原因是它在近400万行上运行位图堆扫描。 (后续排序也很慢。)

我有什么办法可以加快速度吗?

我想知道是否应该创建一个进一步的物化视图,或者多列索引是否有帮助,以便Postgres可以查看索引而不是磁盘。

有什么方法可以让排序更有效率吗?

更新:这是物化视图的定义:

    CREATE MATERIALIZED VIEW vw_chemical_summary_by_practice
    AS SELECT processing_date, practice_id, chemical_id, 
    SUM(total_items) AS items, SUM(actual_cost) AS cost
    FROM frontend_prescription
    GROUP BY processing_date, practice_id, chemical_id

基础表:

id                | integer                 | not null default nextval('frontend_prescription_id_seq'::regclass)
 presentation_code | character varying(15)   | not null
 total_items       | integer                 | not null
 actual_cost       | double precision        | not null
 processing_date   | date                    | not null
 practice_id       | character varying(6)    | not null
Indexes:
    "frontend_prescription_pkey" PRIMARY KEY, btree (id)
    "frontend_prescription_528f368c" btree (processing_date)
    "frontend_prescription_6ea07fe3" btree (practice_id)
    "frontend_prescription_idx_code" btree (presentation_code varchar_pattern_ops)
    "frontend_prescription_idx_date_and_code" btree (processing_date, presentation_code)

更新2:如果不清楚,我需要在所有以0401开头的化学品中按实践和按月获得总支出和项目。

1 个答案:

答案 0 :(得分:2)

-- assuming this is your original table:
CREATE TABLE practice_chemical_old
    ( processing_date date NOT NULL
    , practice_id     character varying(6) NOT NULL
    , chemical_id     character varying(9) NOT NULL
    , items           bigint NOT NULL DEFAULT NULL
    , cost            double precision
    );

-- create these three new tables to decompose it into
CREATE TABLE practice
    ( practice_id SERIAL NOT NULL PRIMARY KEY
    , practice_name character varying(6) UNIQUE
    );
CREATE TABLE chemical
    ( chemical_id SERIAL NOT NULL PRIMARY KEY
    , chemical_name character varying(9) UNIQUE
    );

CREATE TABLE practice_chemical_new
    ( practice_id INTEGER NOT NULL REFERENCES practice (practice_id)
    , chemical_id INTEGER NOT NULL REFERENCES chemical (chemical_id)
    , processing_date date NOT NULL
    , items bigint NOT NULL default 0
    , cost double precision
            -- Not sure if processing_date should be part of the key, too
    , PRIMARY KEY (practice_id, chemical_id)
    );

CREATE UNIQUE INDEX ON practice_chemical_new(chemical_id, practice_id);

INSERT INTO practice(practice_name)
SELECT DISTINCT practice_id FROM practice_chemical_old;

INSERT INTO chemical(chemical_name)
SELECT DISTINCT chemical_id FROM practice_chemical_old;

-- now populate the new tables from the old ones ...
INSERT INTO practice_chemical_new(practice_id, chemical_id, processing_date,items,cost)
SELECT p.practice_id, c.chemical_id, pco.processing_date, pco.items, pco.cost
FROM practice_chemical_old pco
JOIN practice p ON p.practice_name = pco.practice_id
JOIN chemical c ON c.chemical_name = pco.chemical_id
    ;

-- Now,  the original table *could* be represented by the following view (or table, or table expression):
CREATE VIEW practice_chemical_fake AS
SELECT pcn.processing_date AS processing_date
    , p.practice_name AS practice_id
    , c.chemical_name AS chemical_id
    , pcn.items AS items
    , pcn.cost AS cost
FROM practice_chemical_new pcn
JOIN practice p ON p.practice_id = pcn.practice_id
JOIN chemical c ON c.chemical_id = pcn.chemical_id
    ;

注意:原始问题中不清楚是否可能有多个{practice,chemical}实例(使用不同的processing_date)。您可能需要稍微更改PK的定义。