Question

我有两张桌子：

书：

id | tags
---+---------------------------------------------------
1  | [philosophy.plato]
2  | [literature.history]
3  | [cultural_history.18th_century.16th_century.history]

标记：

id | name
---+---------------------------------------------------------
1  | literature
2  | history
3  | philosophy
4  | plato

我正在尝试创建一个连接表来交叉引用书中的项目及其标签。像这样......

books_tags：

book_id | tag_id
--------+---------------------------------------------------------
   1    | 3
   1    | 4
   2    | 1
   2    | 2

如何获取books.tags字符串并将其转换为数组，查找数组中的每个项目，并将其插入到连接表中？

到目前为止，我有：

SELECT distinct(s.tag)
FROM books t, unnest(string_to_array(trim (t.tags, '[]'), '.')) s(tag);

将字符串拆分为数组，但如何遍历每个项目并将其插入连接表？

Answer 1

您可以通过以下查询实现此目的：

WITH books(id, tags) AS (
  VALUES (1::int4, 'philosophy.plato'::text),
    (2, 'literature.history'),
    (3, 'cultural_history.18th_century.16th_century.history')
), tags (id, "name") AS (
  VALUES (1, 'literature'),
    (2, 'history'),
    (3, 'philosophy'),
    (4, 'plato')
)
SELECT b.id book_id,
       t.id tag_id
  FROM (
    SELECT id, regexp_split_to_table(tags, E'\\.') tag
      FROM books) b
  JOIN tags t ON t."name"=b.tag;

一些注意事项：

对列使用name并不好 - 这是一个保留字。如果您仍希望保留，最好在它周围使用双引号。
我的WITH结构符合您的表格，可以在您的情况下跳过

Answer 2

将unnest()与string_to_array()结合（就像您已经拥有的）通常比regexp_split_to_table()快得多，因为正则表达式很昂贵。比较：

Error while using regexp_split_to_table (Amazon Redshift)

我建议LATERAL加入：

What is the difference between LATERAL and a subquery in PostgreSQL?

CREATE TABLE books_tags AS
SELECT b.id AS book_id, t.tag_id
FROM   books b, unnest(string_to_array(trim(b.tags, '[]'), '.')) x(tag)
JOIN   tags  t ON t.name = x.tag
-- GROUP BY 1, 2  -- only if there can be duplicates
-- ORDER BY 1, 2; -- optional, but probably good for performance

＆＃34;名称＆＃34;是not a reserved word - 但对于列名称来说仍然是一个非常糟糕的选择，因为它不具有描述性。我只想使用＆＃34; tag＆＃34;作为标签的名称。

您可能希望添加唯一约束和外键来完成多对多关系。更多：

How to implement a many-to-many relationship in PostgreSQL?

Postgres - 循环遍历字符串条目并创建连接表

2 个答案: