Question

我有一张这样的表（简化）：

CREATE TABLE sales
(
  customer_id integer NOT NULL,
  product_ids integer[] NOT NULL,
  CONSTRAINT "PK_customer_id" PRIMARY KEY (customer_id)
);
CREATE INDEX "IDX_product_ids" ON sales USING gin(product_ids);

该表中有5M行，product_ids数组平均包含200个元素。假设我需要从每个sales行删除特定的已退役产品ID。退休产品列表由子查询到products表生成，该表返回大约30M条记录。

现在我这样做（伪代码）：

create table sales_heap ...
insert into sales_heap select customer_id, unnest(product_ids) from sales;
create index product_id on sales_heap ...
delete from sales_heap where product_id in (select product_id from products ...);
truncate table sales;
insert into sales select customer_id, array_agg(product_id) from sales_heap group by customer_id;
drop table sales_heap;

我还尝试了一些CTE和嵌套的基于FOR的方法，但每次都出现内存错误（8 Gb work_mem）。 array_remove不允许删除项目的数组参数这一事实也很痛苦。

在没有达到记忆限制的情况下，有没有人会想到一种更美妙的方式呢？

PostgreSQL从匹配（大）子查询的数组列中删除元素

0 个答案: