我有一张这样的表(简化):
CREATE TABLE sales
(
customer_id integer NOT NULL,
product_ids integer[] NOT NULL,
CONSTRAINT "PK_customer_id" PRIMARY KEY (customer_id)
);
CREATE INDEX "IDX_product_ids" ON sales USING gin(product_ids);
该表中有5M行,product_ids
数组平均包含200个元素。假设我需要从每个sales
行删除特定的已退役产品ID。退休产品列表由子查询到products
表生成,该表返回大约30M条记录。
现在我这样做(伪代码):
create table sales_heap ...
insert into sales_heap select customer_id, unnest(product_ids) from sales;
create index product_id on sales_heap ...
delete from sales_heap where product_id in (select product_id from products ...);
truncate table sales;
insert into sales select customer_id, array_agg(product_id) from sales_heap group by customer_id;
drop table sales_heap;
我还尝试了一些CTE和嵌套的基于FOR的方法,但每次都出现内存错误(8 Gb work_mem)。 array_remove
不允许删除项目的数组参数这一事实也很痛苦。
在没有达到记忆限制的情况下,有没有人会想到一种更美妙的方式呢?