Question

我正在使用COPY从CSV中将大批量数据插入我们的数据库。插入看起来像这样：

-- This tmp table will contain all the items that we want to try to insert
CREATE TEMP TABLE tmp_items
(
    field1 INTEGER NULL,
    field2 INTEGER NULL,
    ...
) ON COMMIT DROP;

COPY tmp_items(
    field1,
    field2,
    ...
) FROM 'path\to\data.csv' WITH (FORMAT csv);

-- Start inserting some items
WITH newitems AS (
    INSERT INTO items (field1, field2)
    SELECT tmpi.field1, tmpi,field2
    FROM tmp_items tmpi
    WHERE some condition

    -- Return the new id and other fields to the next step
    RETURNING id AS newid, field1 AS field1
)
-- Insert the result into another temp table
INSERT INTO tmp_newitems SELECT * FROM newitems;

-- Use tmp_newitems to update other tables
etc....

何时将使用tmp_items中的数据在多个表中执行多次插入。我们在插入之前检查重复项并以几种方式操作数据，因此tmp_items中的所有内容都不会按原样使用或插入。我们通过CTE和更多临时表的组合来实现这一目标。

这非常有效，并且足以满足我们的需求。我们做了很多这样的问题，我们遇到的问题是pg_attribute变得非常臃肿，autovacuum似乎无法跟上（并且消耗很多的CPU）

我的问题是：

是否可以在不使用临时表的情况下执行此类插入？
如果没有，我们是否应该让pg_attribute更加激进？不会占用那么多或更多的CPU吗？

Answer 1

最好的解决方案是在会话开始时使用

创建临时表

CREATE TEMPORARY TABLE ... (
   ...
) ON COMMIT DELETE ROWS;

然后临时表将在会话期间保留，但在每次提交时都会清空。

这将大大减少pg_attribute的膨胀，而且腹胀不再是一个问题。

你也可以加入黑暗面（警告，这是不受支持的）：

使用
启动PostgreSQL
```
pg_ctl start -o -O
```
以便您可以修改系统目录。

以超级用户身份连接并运行

UPDATE pg_catalog.pg_class
SET reloptions = ARRAY['autovacuum_vacuum_cost_delay=0']
WHERE oid = 'pg_catalog.pg_attribute'::regclass;

现在autovacuum将在pg_attribute上更积极地运行，这可能会解决您的问题。

请注意，重大升级后设置将会消失。

Answer 2

我知道这是一个老问题，但是将来有人会在这里找到我的帮助。

因此，我们非常忙于具有> 500 rps的临时表，并通过nodejs进行异步i / o，因此，由于这个原因，pg_attribute变得非常繁琐。您所剩下的就是非常激进的吸尘功能，这会停止性能。此处给出的所有答案都不能解决这个问题，因为删除和重新创建临时表会使pg_attribute严重膨胀，因此一个阳光明媚的早晨，您会发现数据库性能下降，而pg_attribute 200+ gb而您的数据库则为10gb。

所以解决方案很优雅

create temp table if not exists my_temp_table (description) on commit delete rows;

因此，您可以继续使用临时表，保存pg_attribute，避免黑暗面大量吸尘，并获得理想的性能。

不要忘记

vacuum full pg_depend;
vacuum full pg_attribute;

干杯：）

临时表膨胀pg_attribute

2 个答案: