我的任务是提高我最初创建的临时数据库的性能,特别是删除数据。
我最初使用(基本上)数据源的自然键来创建表,从中提取数据。因此,密钥是多部分的,非常复杂。
为了加快速度,我创建了代理主键和外键,并编写了一个函数来填充它们,以便连接更加简单。我还为层次结构的根创建了密钥,试图加速删除整个“批处理”。
然而,只是填充代理外键的效率非常低,而且我不知道为什么或该怎么做。
对于每张桌子,我都遵循这种模式:
结果可能太复杂了,但请耐心等待。
以下是叶表的示例:
CREATE TABLE staging.cost_years
(
batch timestamp without time zone NOT NULL,
request_packet_sent_timestamp timestamp without time zone NOT NULL,
request integer NOT NULL,
program character varying(256) NOT NULL,
estimate_type character varying(10) NOT NULL,
effective_date date NOT NULL,
subprogram character varying(256) NOT NULL,
referenced_estimate character varying(25) NOT NULL,
symbol character varying(14),
fiscal_year character(4) NOT NULL,
<unimportant_stuff>
cost_subprograms_id integer,
cost_accounts_id integer,
CONSTRAINT fk_cost_accounts_cost_subprograms FOREIGN KEY (cost_subprograms_id)
REFERENCES staging.cost_subprograms (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT fk_cost_years_batch FOREIGN KEY (batch)
REFERENCES staging.batch (batch) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE CASCADE,
CONSTRAINT fk_cost_years_cost_accounts FOREIGN KEY (cost_accounts_id)
REFERENCES staging.cost_accounts (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
-- Index: fkix_cost_years_batch
CREATE INDEX fkix_cost_years_batch
ON staging.cost_years USING btree
(batch)
TABLESPACE pg_default;
-- Index: fkix_cost_years_cost_accounts
CREATE INDEX fkix_cost_years_cost_accounts
ON staging.cost_years USING btree
(cost_accounts_id)
TABLESPACE pg_default;
-- Index: fkix_cost_years_cost_subprograms
CREATE INDEX fkix_cost_years_cost_subprograms
ON staging.cost_years USING btree
(cost_subprograms_id)
TABLESPACE pg_default;
-- Index: ix_cost_years_cost_accounts
CREATE INDEX ix_cost_years_cost_accounts
ON staging.cost_years USING btree
(batch, request_packet_sent_timestamp, request, program, estimate_type,
effective_date, subprogram, referenced_estimate, symbol)
TABLESPACE pg_default;
-- Index: ix_cost_years_cost_subprograms
CREATE INDEX ix_cost_years_cost_subprograms
ON staging.cost_years USING btree
(batch, request_packet_sent_timestamp, request, program, estimate_type,
effective_date, subprogram, referenced_estimate)
TABLESPACE pg_default;
-- Index: ux_cost_years
CREATE UNIQUE INDEX ux_cost_years
ON staging.cost_years USING btree
(batch, request_packet_sent_timestamp, request, program, estimate_type,
effective_date, subprogram, referenced_estimate, symbol, fiscal_year)
TABLESPACE pg_default;
我用来尝试填充代理键的一个查询是:
-- cost_years to cost_subprograms
UPDATE
staging.cost_years
SET
cost_subprograms_id = cost_subprograms.id
FROM
staging.cost_subprograms
WHERE
cost_years.batch = staging.cost_subprograms.batch
AND cost_years.request_packet_sent_timestamp = staging.cost_subprograms.request_packet_sent_timestamp
AND cost_years.request = staging.cost_subprograms.request
AND cost_years.program = staging.cost_subprograms.program
AND cost_years.estimate_type = staging.cost_subprograms.estimate_type
AND cost_years.effective_date = staging.cost_subprograms.effective_date
AND cost_years.subprogram = staging.cost_subprograms.subprogram
AND cost_years.referenced_estimate = staging.cost_subprograms.referenced_estimate
AND cost_years.batch = COALESCE(batch_in, cost_years.batch);
这需要永远,当我在它上面运行EXPLAIN时,它忽略了我专门为此目的添加并执行嵌套循环的索引(ix_cost_years_cost_subprograms):
'[
{
"Execution Time": 321003.489,
"Planning Time": 2.452,
"Plan": {
"Plans": [
{
"Plans": [
{
"Filter": "(batch = batch)",
"Node Type": "Seq Scan",
"Relation Name": "cost_years",
"Alias": "cost_years",
"Parallel Aware": false,
"Actual Rows": 1567083,
"Parent Relationship": "Outer",
"Rows Removed by Filter": 0,
"Actual Loops": 1
},
{
"Scan Direction": "Forward",
"Rows Removed by Index Recheck": 0,
"Node Type": "Index Scan",
"Index Cond": "(
(batch = cost_years.batch) AND
(request_packet_sent_timestamp = cost_years.request_packet_sent_timestamp) AND
(request = cost_years.request) AND
((program)::text = (cost_years.program)::text) AND
((estimate_type)::text = (cost_years.estimate_type)::text) AND
(effective_date = cost_years.effective_date) AND
((subprogram)::text = (cost_years.subprogram)::text) AND
((referenced_estimate)::text = (cost_years.referenced_estimate)::text))",
"Relation Name": "cost_subprograms",
"Alias": "cost_subprograms",
"Parallel Aware": false,
"Actual Rows": 1,
"Parent Relationship": "Inner",
"Actual Loops": 1567083,
"Index Name": "ux_cost_subprograms"
}
],
"Node Type": "Nested Loop",
"Join Type": "Inner",
"Parallel Aware": false,
"Actual Rows": 1567083,
"Parent Relationship": "Member",
"Actual Loops": 1
}
],
"Node Type": "ModifyTable",
"Relation Name": "cost_years",
"Alias": "cost_years",
"Parallel Aware": false,
"Actual Rows": 0,
"Operation": "Update",
"Actual Loops": 1
},
"Triggers": []
}
]'
我知道要问很多,但是有没有人很快就会明白为什么它不会尝试使用ix_cost_years_cost_subprograms?非常感谢任何帮助。