如何更有效地更新此表中的13亿行?

时间:2017-09-02 22:24:45

标签: postgresql sql-update large-data

我在PostgreSQL表sku_comparison中有13亿行,如下所示:

id1 (INTEGER) | id2 (INTEGER) | (10 SMALLINT columns) | length1 (SMALLINT)... |

... length2 (SMALLINT) | length_difference (SMALLINT)

id1id2列在名为sku的表中引用,该表包含大约300,000行,并且在列的每一行中都有一个关联的varchar(25)值,code

id1id2上构建了一个btree索引,id1中的复合索引为id2sku_comparisonid列的sku列上还有一个btree索引。

我的目标是使用length1表中相应length2列的长度更新codesku列。但是,我运行以下代码超过20个小时,但它没有完成更新:

UPDATE sku_comparison SET length1=length(sku.code) FROM sku 
WHERE sku_comparison.id1=sku.id;

所有数据都存储在本地计算机上的单个硬盘上,处理器相当现代。构建这个表,需要在Python中进行更复杂的字符串比较,只花了大约30个小时左右,所以我不确定为什么这样的事情需要花费很长时间。

编辑:这里是格式化的表定义:

                                     Table "public.sku"
   Column   |         Type          |                    Modifiers                     
------------+-----------------------+--------------------------------------------------
 id         | integer               | not null default nextval('sku_id_seq'::regclass)
 sku        | character varying(25) | 
 pattern    | character varying(25) | 
 pattern_an | character varying(25) | 
 firsttwo   | character(2)          | default '  '::bpchar
 reference  | character varying(25) | 
Indexes:
    "sku_pkey" PRIMARY KEY, btree (id)
    "sku_sku_idx" UNIQUE, btree (sku)
    "sku_firstwo_idx" btree (firsttwo)
Referenced by:
    TABLE "sku_comparison" CONSTRAINT "sku_comparison_id1_fkey" FOREIGN KEY (id1) REFERENCES sku(id)
    TABLE "sku_comparison" CONSTRAINT "sku_comparison_id2_fkey" FOREIGN KEY (id2) REFERENCES sku(id)


            Table "public.sku_comparison"
          Column           |   Type   |        Modifiers        
---------------------------+----------+-------------------------
 id1                       | integer  | not null
 id2                       | integer  | not null
 consec_charmatch          | smallint | 
 consec_groupmatch         | smallint | 
 consec_fieldtypematch     | smallint | 
 consec_groupmatch_an      | smallint | 
 consec_fieldtypematch_an  | smallint | 
 general_charmatch         | smallint | 
 general_groupmatch        | smallint | 
 general_fieldtypematch    | smallint | 
 general_groupmatch_an     | smallint | 
 general_fieldtypematch_an | smallint | 
 length1                   | smallint | default 0
 length2                   | smallint | default 0
 length_difference         | smallint | default '-999'::integer
Indexes:
    "sku_comparison_pkey" PRIMARY KEY, btree (id1, id2)
    "ssd_id1_idx" btree (id1)
    "ssd_id2_idx" btree (id2)
Foreign-key constraints:
    "sku_comparison_id1_fkey" FOREIGN KEY (id1) REFERENCES sku(id)
    "sku_comparison_id2_fkey" FOREIGN KEY (id2) REFERENCES sku(id)

1 个答案:

答案 0 :(得分:0)

您会考虑使用匿名代码块吗?

使用伪代码...

FOREACH 'SELECT ski.id, 
                sku.code, 
                length(sku.code) 
         FROM   sku 
         INTO   v_skuid, v_skucode, v_skulength'
DO 
 UPDATE sku_comparison 
 SET sku_comparison.length1 = v_skulength
 WHERE sku_comparison.id1=v_skuid;
END DO
END FOREACH

这会将整个事务分解为较小的事务,并且您不会每次都评估sku.code的长度。