Question

我正在尝试将bytea数据从一个表移动到另一个表，在一个查询中更新引用。

因此，我想从用于插入的查询返回数据，该插入不用于插入。

INSERT INTO file_data (data)
  select image from task_log where image is not null
RETURNING id as file_data_id, task_log.id as task_log_id

但是我收到了该查询的错误：

[42P01] ERROR: missing FROM-clause entry for table "task_log"

我想做类似的事情：

WITH inserted AS (
  INSERT INTO file_data (data)
    SELECT image FROM task_log WHERE image IS NOT NULL
  RETURNING id AS file_data_id, task_log.id AS task_log_id
)
UPDATE task_log
SET    task_log.attachment_id = inserted.file_data_id,
       task_log.attachment_type = 'INLINE_IMAGE'
FROM   inserted
WHERE  inserted.task_log_id = task_log.id;

但是我没有得到用于插入的所有数据，我无法从子选择中返回id。

我受this答案的启发，关于如何使用Common Table Expressions做到这一点，但我找不到办法让它发挥作用。

Answer 1

您需要正确获取表名和别名。另外，两个表之间的连接是image列（新表data中的file_data）：

WITH inserted AS (
  INSERT INTO file_data (data)
  SELECT image
  FROM   task_log
  WHERE  image IS NOT NULL
  RETURNING id, data  -- can only reference target row
)
UPDATE task_log t
SET    attachment_id = i.id
     , attachment_type = 'INLINE_IMAGE'
FROM   inserted i
WHERE  t.image = i.data;

就像我在上面提到的旧答案中所解释的那样，image在task_log中必须是唯一的才能实现此目的：

Insert data and set foreign keys with Postgres

我添加了一种技术，如何消除引用答案中的非唯一值的歧义。不过，我不确定你是否想在file_data中找到重复的图像。

在RETURNING的{{1}}子句中，您只能引用插入行中的列。 The manual:

可选的INSERT子句导致RETURNING计算并返回基于实际插入的每一行的值（...）但是，任何允许使用表格列的表达式。

大胆强调我的。

折叠重复的源值

如果您希望INSERT（INSERT）的目标表中包含不同的条目，则在这种情况下您需要的只是task_log中的DISTINCT：

SELECT

结果WITH inserted AS ( INSERT INTO file_data (data) SELECT DISTINCT image -- fold duplicates FROM task_log WHERE image IS NOT NULL RETURNING id, data -- can only reference target row ) UPDATE task_log t SET attachment_id = i.id , attachment_type = 'INLINE_IMAGE' FROM inserted i WHERE t.image = i.data;在file_data.id中多次使用。请注意，task_log中的多行现在指向task_log中的同一图像。小心更新和删除...

Answer 2

我需要复制重复项，所以我最终为使用过的数据行的id添加了一个临时列。

alter table file_data add column task_log_id bigint;
-- insert & update data
alter table file_data drop column task_log_id;

完整移动脚本是

-- A new table for any file data
CREATE TABLE file_data (
  id         BIGSERIAL PRIMARY KEY,
  data  bytea
);

-- Move data from task_log to bytes

-- Create new columns to reference file_data
alter table task_log add column attachment_type VARCHAR(50);
alter table task_log add column attachment_id bigint REFERENCES file_data;

-- add a temp column for the task_id used for the insert
alter table file_data add column task_log_id bigint;

-- insert data into file_data and set references
with inserted as (
  INSERT INTO file_data (data, task_log_id)
    select image, id from task_log where image is not null
  RETURNING id, task_log_id
)
UPDATE task_log
SET   attachment_id = inserted.id,
      attachment_type = 'INLINE_IMAGE'
FROM  inserted
where inserted.task_log_id = task_log.id;
-- delete the temp column
alter table file_data drop column task_log_id;
-- delete task_log images
alter table task_log drop column image;

由于这会产生一些死数据，我之后运行vacuum full来清理。

但请让我重复@ErwinBrandstetter的警告：

性能比使用linked answer中提出的序列号的方法要糟糕得多。添加＆amp;删除列需要所有者的权限，表上的完整表重写和独占锁，这对于并发访问是有害的。

从公用表表达式中的INSERT中使用的子选择返回数据

2 个答案:

折叠重复的源值