参照完整性&调整:用代理键替换复杂的自然键& PostgreSQL中的指数

时间:2017-12-26 19:21:28

标签: sql database postgresql performance referential-integrity

我的任务是提高我最初创建的临时数据库的性能,特别是删除数据。

我最初使用(基本上)数据源的自然键来创建表,从中提取数据。因此,密钥是多部分的,非常复杂。

为了加快速度,我创建了代理主键和外键,并编写了一个函数来填充它们,以便连接更加简单。我还为层次结构的根创建了密钥,试图加速删除整个“批处理”。

然而,只是填充代理外键的效率非常低,而且我不知道为什么或该怎么做。

对于每张桌子,我都遵循这种模式:

  1. 删除现有的复合主键和外键。
  2. 如果表中有子项(或在某些情况下为自引用),请添加SERIAL类型的新id列,并将其指定为新的主键。
  3. 创建一个名为ux_的唯一索引,该索引涵盖以前的复合主键的所有字段。
  4. 如果表格包含父级和/或自引用,请添加一个或多个名为_id的字段,类型为INTEGER。
  5. 添加名为fk_的外键,用于将新的整数字段链接到父表的id字段。
  6. 在子表中的外键字段上创建索引。
  7. 在子表中最初构成自然外键的字段上创建索引。
  8. 使用delete cascade将外键添加到层次结构的根目录。
  9. 将外键的索引添加到层次结构的根目录。
  10. 结果可能太复杂了,但请耐心等待。

    以下是叶表的示例:

    CREATE TABLE staging.cost_years
    (
        batch timestamp without time zone NOT NULL,
        request_packet_sent_timestamp timestamp without time zone NOT NULL,
        request integer NOT NULL,
        program character varying(256) NOT NULL,
        estimate_type character varying(10) NOT NULL,
        effective_date date NOT NULL,
        subprogram character varying(256) NOT NULL,
        referenced_estimate character varying(25) NOT NULL,
        symbol character varying(14),
        fiscal_year character(4) NOT NULL,
        <unimportant_stuff>
        cost_subprograms_id integer,
        cost_accounts_id integer,
        CONSTRAINT fk_cost_accounts_cost_subprograms FOREIGN KEY (cost_subprograms_id)
            REFERENCES staging.cost_subprograms (id) MATCH SIMPLE
            ON UPDATE NO ACTION
            ON DELETE NO ACTION,
        CONSTRAINT fk_cost_years_batch FOREIGN KEY (batch)
            REFERENCES staging.batch (batch) MATCH SIMPLE
            ON UPDATE NO ACTION
            ON DELETE CASCADE,
        CONSTRAINT fk_cost_years_cost_accounts FOREIGN KEY (cost_accounts_id)
            REFERENCES staging.cost_accounts (id) MATCH SIMPLE
            ON UPDATE NO ACTION
            ON DELETE NO ACTION
    )
    
    -- Index: fkix_cost_years_batch
    
    CREATE INDEX fkix_cost_years_batch
        ON staging.cost_years USING btree
        (batch)
        TABLESPACE pg_default;
    
    -- Index: fkix_cost_years_cost_accounts
    
    CREATE INDEX fkix_cost_years_cost_accounts
        ON staging.cost_years USING btree
        (cost_accounts_id)
        TABLESPACE pg_default;
    
    -- Index: fkix_cost_years_cost_subprograms
    
    CREATE INDEX fkix_cost_years_cost_subprograms
        ON staging.cost_years USING btree
        (cost_subprograms_id)
        TABLESPACE pg_default;
    
    -- Index: ix_cost_years_cost_accounts
    
    CREATE INDEX ix_cost_years_cost_accounts
        ON staging.cost_years USING btree
        (batch, request_packet_sent_timestamp, request, program, estimate_type,
         effective_date, subprogram, referenced_estimate, symbol)
        TABLESPACE pg_default;
    
    -- Index: ix_cost_years_cost_subprograms
    
    CREATE INDEX ix_cost_years_cost_subprograms
        ON staging.cost_years USING btree
        (batch, request_packet_sent_timestamp, request, program, estimate_type,
         effective_date, subprogram, referenced_estimate)
        TABLESPACE pg_default;
    
    -- Index: ux_cost_years
    
    CREATE UNIQUE INDEX ux_cost_years
        ON staging.cost_years USING btree
        (batch, request_packet_sent_timestamp, request, program, estimate_type,
        effective_date, subprogram, referenced_estimate, symbol, fiscal_year)
        TABLESPACE pg_default;
    

    我用来尝试填充代理键的一个查询是:

    -- cost_years to cost_subprograms
    UPDATE
        staging.cost_years
    SET
        cost_subprograms_id = cost_subprograms.id
    FROM
        staging.cost_subprograms
    WHERE
        cost_years.batch = staging.cost_subprograms.batch
        AND cost_years.request_packet_sent_timestamp = staging.cost_subprograms.request_packet_sent_timestamp
        AND cost_years.request = staging.cost_subprograms.request
        AND cost_years.program = staging.cost_subprograms.program
        AND cost_years.estimate_type = staging.cost_subprograms.estimate_type
        AND cost_years.effective_date = staging.cost_subprograms.effective_date
        AND cost_years.subprogram = staging.cost_subprograms.subprogram
        AND cost_years.referenced_estimate = staging.cost_subprograms.referenced_estimate
        AND cost_years.batch = COALESCE(batch_in, cost_years.batch);
    

    这需要永远,当我在它上面运行EXPLAIN时,它忽略了我专门为此目的添加并执行嵌套循环的索引(ix_cost_years_cost_subprograms):

    '[
      {
        "Execution Time": 321003.489,
        "Planning Time": 2.452,
        "Plan": {
          "Plans": [
            {
              "Plans": [
                {
                  "Filter": "(batch = batch)",
                  "Node Type": "Seq Scan",
                  "Relation Name": "cost_years",
                  "Alias": "cost_years",
                  "Parallel Aware": false,
                  "Actual Rows": 1567083,
                  "Parent Relationship": "Outer",
                  "Rows Removed by Filter": 0,
                  "Actual Loops": 1
                },
                {
                  "Scan Direction": "Forward",
                  "Rows Removed by Index Recheck": 0,
                  "Node Type": "Index Scan",
                  "Index Cond": "(
        (batch = cost_years.batch) AND
        (request_packet_sent_timestamp = cost_years.request_packet_sent_timestamp) AND
        (request = cost_years.request) AND
        ((program)::text = (cost_years.program)::text) AND
        ((estimate_type)::text = (cost_years.estimate_type)::text) AND
        (effective_date = cost_years.effective_date) AND
        ((subprogram)::text = (cost_years.subprogram)::text) AND 
        ((referenced_estimate)::text = (cost_years.referenced_estimate)::text))",
                  "Relation Name": "cost_subprograms",
                  "Alias": "cost_subprograms",
                  "Parallel Aware": false,
                  "Actual Rows": 1,
                  "Parent Relationship": "Inner",
                  "Actual Loops": 1567083,
                  "Index Name": "ux_cost_subprograms"
                }
              ],
              "Node Type": "Nested Loop",
              "Join Type": "Inner",
              "Parallel Aware": false,
              "Actual Rows": 1567083,
              "Parent Relationship": "Member",
              "Actual Loops": 1
            }
          ],
          "Node Type": "ModifyTable",
          "Relation Name": "cost_years",
          "Alias": "cost_years",
          "Parallel Aware": false,
          "Actual Rows": 0,
          "Operation": "Update",
          "Actual Loops": 1
        },
        "Triggers": []
      }
    ]'
    

    我知道要问很多,但是有没有人很快就会明白为什么它不会尝试使用ix_cost_years_cost_subprograms?非常感谢任何帮助。

0 个答案:

没有答案