Question

我的任务是提高我最初创建的临时数据库的性能，特别是删除数据。

我最初使用（基本上）数据源的自然键来创建表，从中提取数据。因此，密钥是多部分的，非常复杂。

为了加快速度，我创建了代理主键和外键，并编写了一个函数来填充它们，以便连接更加简单。我还为层次结构的根创建了密钥，试图加速删除整个“批处理”。

然而，只是填充代理外键的效率非常低，而且我不知道为什么或该怎么做。

对于每张桌子，我都遵循这种模式：

删除现有的复合主键和外键。
如果表中有子项（或在某些情况下为自引用），请添加SERIAL类型的新id列，并将其指定为新的主键。
创建一个名为ux_的唯一索引，该索引涵盖以前的复合主键的所有字段。
如果表格包含父级和/或自引用，请添加一个或多个名为_id的字段，类型为INTEGER。
添加名为fk_的外键，用于将新的整数字段链接到父表的id字段。
在子表中的外键字段上创建索引。
在子表中最初构成自然外键的字段上创建索引。
使用delete cascade将外键添加到层次结构的根目录。
将外键的索引添加到层次结构的根目录。

结果可能太复杂了，但请耐心等待。

以下是叶表的示例：

CREATE TABLE staging.cost_years
(
    batch timestamp without time zone NOT NULL,
    request_packet_sent_timestamp timestamp without time zone NOT NULL,
    request integer NOT NULL,
    program character varying(256) NOT NULL,
    estimate_type character varying(10) NOT NULL,
    effective_date date NOT NULL,
    subprogram character varying(256) NOT NULL,
    referenced_estimate character varying(25) NOT NULL,
    symbol character varying(14),
    fiscal_year character(4) NOT NULL,
    <unimportant_stuff>
    cost_subprograms_id integer,
    cost_accounts_id integer,
    CONSTRAINT fk_cost_accounts_cost_subprograms FOREIGN KEY (cost_subprograms_id)
        REFERENCES staging.cost_subprograms (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT fk_cost_years_batch FOREIGN KEY (batch)
        REFERENCES staging.batch (batch) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE CASCADE,
    CONSTRAINT fk_cost_years_cost_accounts FOREIGN KEY (cost_accounts_id)
        REFERENCES staging.cost_accounts (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
)

-- Index: fkix_cost_years_batch

CREATE INDEX fkix_cost_years_batch
    ON staging.cost_years USING btree
    (batch)
    TABLESPACE pg_default;

-- Index: fkix_cost_years_cost_accounts

CREATE INDEX fkix_cost_years_cost_accounts
    ON staging.cost_years USING btree
    (cost_accounts_id)
    TABLESPACE pg_default;

-- Index: fkix_cost_years_cost_subprograms

CREATE INDEX fkix_cost_years_cost_subprograms
    ON staging.cost_years USING btree
    (cost_subprograms_id)
    TABLESPACE pg_default;

-- Index: ix_cost_years_cost_accounts

CREATE INDEX ix_cost_years_cost_accounts
    ON staging.cost_years USING btree
    (batch, request_packet_sent_timestamp, request, program, estimate_type,
     effective_date, subprogram, referenced_estimate, symbol)
    TABLESPACE pg_default;

-- Index: ix_cost_years_cost_subprograms

CREATE INDEX ix_cost_years_cost_subprograms
    ON staging.cost_years USING btree
    (batch, request_packet_sent_timestamp, request, program, estimate_type,
     effective_date, subprogram, referenced_estimate)
    TABLESPACE pg_default;

-- Index: ux_cost_years

CREATE UNIQUE INDEX ux_cost_years
    ON staging.cost_years USING btree
    (batch, request_packet_sent_timestamp, request, program, estimate_type,
    effective_date, subprogram, referenced_estimate, symbol, fiscal_year)
    TABLESPACE pg_default;

我用来尝试填充代理键的一个查询是：

-- cost_years to cost_subprograms
UPDATE
    staging.cost_years
SET
    cost_subprograms_id = cost_subprograms.id
FROM
    staging.cost_subprograms
WHERE
    cost_years.batch = staging.cost_subprograms.batch
    AND cost_years.request_packet_sent_timestamp = staging.cost_subprograms.request_packet_sent_timestamp
    AND cost_years.request = staging.cost_subprograms.request
    AND cost_years.program = staging.cost_subprograms.program
    AND cost_years.estimate_type = staging.cost_subprograms.estimate_type
    AND cost_years.effective_date = staging.cost_subprograms.effective_date
    AND cost_years.subprogram = staging.cost_subprograms.subprogram
    AND cost_years.referenced_estimate = staging.cost_subprograms.referenced_estimate
    AND cost_years.batch = COALESCE(batch_in, cost_years.batch);

这需要永远，当我在它上面运行EXPLAIN时，它忽略了我专门为此目的添加并执行嵌套循环的索引（ix_cost_years_cost_subprograms）：

'[
  {
    "Execution Time": 321003.489,
    "Planning Time": 2.452,
    "Plan": {
      "Plans": [
        {
          "Plans": [
            {
              "Filter": "(batch = batch)",
              "Node Type": "Seq Scan",
              "Relation Name": "cost_years",
              "Alias": "cost_years",
              "Parallel Aware": false,
              "Actual Rows": 1567083,
              "Parent Relationship": "Outer",
              "Rows Removed by Filter": 0,
              "Actual Loops": 1
            },
            {
              "Scan Direction": "Forward",
              "Rows Removed by Index Recheck": 0,
              "Node Type": "Index Scan",
              "Index Cond": "(
    (batch = cost_years.batch) AND
    (request_packet_sent_timestamp = cost_years.request_packet_sent_timestamp) AND
    (request = cost_years.request) AND
    ((program)::text = (cost_years.program)::text) AND
    ((estimate_type)::text = (cost_years.estimate_type)::text) AND
    (effective_date = cost_years.effective_date) AND
    ((subprogram)::text = (cost_years.subprogram)::text) AND 
    ((referenced_estimate)::text = (cost_years.referenced_estimate)::text))",
              "Relation Name": "cost_subprograms",
              "Alias": "cost_subprograms",
              "Parallel Aware": false,
              "Actual Rows": 1,
              "Parent Relationship": "Inner",
              "Actual Loops": 1567083,
              "Index Name": "ux_cost_subprograms"
            }
          ],
          "Node Type": "Nested Loop",
          "Join Type": "Inner",
          "Parallel Aware": false,
          "Actual Rows": 1567083,
          "Parent Relationship": "Member",
          "Actual Loops": 1
        }
      ],
      "Node Type": "ModifyTable",
      "Relation Name": "cost_years",
      "Alias": "cost_years",
      "Parallel Aware": false,
      "Actual Rows": 0,
      "Operation": "Update",
      "Actual Loops": 1
    },
    "Triggers": []
  }
]'

我知道要问很多，但是有没有人很快就会明白为什么它不会尝试使用ix_cost_years_cost_subprograms？非常感谢任何帮助。

参照完整性＆amp;调整：用代理键替换复杂的自然键＆amp; PostgreSQL中的指数

0 个答案: