Question

我想对二进制字符串执行子字符串替换操作。有一个函数可以为类型text（c.f.）的字符串执行此操作：

replace(string text, from text, to text)

但遗憾的是，对于bytea（c.f.）类型的二进制字符串，没有。

现在我想知道，我是否需要为二进制字符串重新实现此操作，还是可以使用相应的基本字符串函数来执行此任务？是否存在可能破坏我的申请的边缘案例：

select replace('\000\015Hello World\000\015Hello World'::bytea::text,
               'World',
               'Jenny')::bytea

到目前为止，我在文档中找不到具体的注释。有人可以帮我吗？

Answer 1

转换为text并返回bytea的问题是，如果替换字符串包含字符串中的引用字节，则无效。让我们看一个例子。

（我将bytea_output设置为hex以便更好地查看文本，否则全部是十六进制数字）

初始查询：

 with input(x) as (values (('\000\015Hello World\000\015Hello World'::bytea)))
  select replace(x::text, 'World', 'Jenny')::bytea from input;

结果很好：

                replace                 
----------------------------------------
 \000\015Hello Jenny\000\015Hello Jenny
(1 row)

但是如果尝试使用想要通过0

替换字符1的修改版本

with input(x) as (values (('\000\015Hello 0orld\000\015Hello 0orld'::bytea)))
  select replace(x::text, '0', '1')::bytea from input;

结果是：

                replace                 
----------------------------------------
 IMHello 1orldIMHello 1orld

而期望的结果是：\000\015Hello 1orld\000\015Hello 1orld。发生这种情况是因为中间表示\000\015被\111\115

取代

Answer 2

根据@DanielVérité的建议，我实现了plpgsql函数，该函数使用bytea类型的二进制字符串替换字符串。在实现中我只使用了二进制字符串部分中的函数，所以我认为它应该是安全的。

这是我的代码：

CREATE OR REPLACE FUNCTION
replace_binary(input_str bytea, pattern bytea, replacement bytea)
RETURNS bytea
AS $$
DECLARE
    buf bytea;
    pos integer;
BEGIN
    buf := '';
    -- validate input
    IF coalesce(length(input_str), 0) = 0 OR coalesce(length(pattern), 0) = 0
    THEN
        RETURN input_str;
    END IF;
    replacement := coalesce(replacement, '');
    LOOP
        -- find position of pattern in input
        pos := position(pattern in input_str);
        IF pos = 0 THEN
            -- not found: append remaining input to buffer and return
            buf := buf || substring(input_str from 1);
            RETURN buf;
        ELSE
            -- found: append substring before pattern to buffer
            buf := buf || substring(input_str from 1 for pos - 1);
            -- append replacement
            buf := buf || replacement;
            -- go on with substring of input
            input_str := substring(input_str from pos + length(pattern));
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql
IMMUTABLE;

至于我的测试用例，它的效果非常好：

with input(buf, pattern, replacement) as (values 
    ('tt'::bytea, 't'::bytea, 'ttt'::bytea),
    ('test'::bytea, 't'::bytea, 'ttt'::bytea),
    ('abcdefg'::bytea, 't'::bytea, 'ttt'::bytea),
    ('\000\015Hello 0orld\000\015Hello 0orld'::bytea, '0'::bytea, '1'::bytea))

select encode(replace_binary(buf, pattern, replacement), 'escape') from input;

按预期输出：

               encode               
------------------------------------
 tttttt
 tttesttt
 abcdefg
 \000\rHello 1orld\000\rHello 1orld
(4 rows)

用二进制字符串替换子字符串

2 个答案: