Question

我希望从同一个表的列上匹配正则表达式的结果中填充两列。

提取数组中的匹配很容易：

select regexp_matches(description, '(?i)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$') matches from room;

（请注意，只有部分行匹配，而不是全部匹配）

但是为了进行更新，我找不到比

更简单的东西

1）重复这个荒谬的正则表达式：

update room r set
    link=(regexp_matches(description, '(?i)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$'))[1],
    description=(regexp_matches(description, '(?i)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$'))[2]
where description ~ '(?i)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$';

2）带有子查询和id连接的查询，它看起来很复杂，可能不是最有效的：

update room r set link=matches[1], description=matches[2] from (
    select id, regexp_matches(description, '(?i)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$') matches from room
) s where matches is not null and r.id=s.id;

这里有什么合适的解决方案？我怀疑postgresql的一个神奇的数组函数会做这个技巧，或者是另一个与regexp相关的函数，或者更简单的东西。

Answer 1

从9.5开始，您可以使用following syntax：

with p(pattern) as (
  select '(?in)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$'
)
update room
set    (link, description) = (select m[1], m[2]
                              from   regexp_matches(description, pattern) m)
from   p
where  description ~ pattern;

这种方式regexp_matches()只执行一次，但这将执行两次正则表达式。如果你想避免这种情况，你还是需要使用连接。 Or, you could do：

update room
set    (link, description) = (
  select coalesce(m[1], l), coalesce(m[2], d)
  from   (select link l, description d) s,
         regexp_matches(d, '(?in)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$') m
);

但无论怎样，这都会“触动”每一行。当没有匹配时，它不会修改link和description的值。

不使用子查询

1 个答案: