Question

我正在寻找一些用于编写函数的指针（让我们称之为replaceGlobal），它接受输入字符串和子字符串到替换值的映射，并应用这些映射使得尽可能多地替换输入字符串中的字符。例如：

replaceGlobal("abcde", {
    'a' -> 'w',
    'abc' -> 'x',
    'ab' -> 'y',
    'cde' -> 'z'
})

将通过应用"yz"和'ab' -> 'y'返回'cde' -> 'z'。

该函数仅应用一轮替换，因此它不能替换值，然后将替换值的一部分用作另一个替换的一部分。

贪婪的方法产生非最佳结果（在Javascript中显示）：

"abcde".replace(/(abc|cde|ab|a)/g, function(x) {
    return {
        'a': 'w',
        'abc': 'x',
        'ab': 'y',
        'cde': 'z'
    }[x];
});

返回'xde'

对这里一个好的起点有什么想法吗？

我认为问题归结为在加权DAG中找到最低成本路径，其中输入字符串为脊柱，而其他边缘由替换提供：

   /------x------------\
  /-----y------\        \
 /---w--\       \        \ /-------z------\
0 -----> a ----> b -----> c -----> d ----> e ----> $

其中脊柱边缘的成本为1，但其他边缘的成本为零。

但这可能会使事情过于复杂。

Answer 1

我觉得dynamic programming是要走的路。这是由于限制：

该功能仅适用于一轮替换，因此不能替换一个值，然后使用部分替换值作为其中的一部分另一种替代。

具体来说，假设您有一些随机字符串 abcdefg 作为输入。现在你应用一些规则替换某些中间部分，比如 de - ＆gt; X 。现在你有 abc x fg ，其中你现在允许操作的唯一（较小的子问题）字符串是 abc 和 fg < / strong>即可。对于重复的子字符串，您可以使用memoization。

Answer 2

基于@Matt Timmermans的评论和原始的DAG想法，这是我在Javascript中首次尝试的结果（我对算法本身比任何特定的语言实现更感兴趣）：

const replaceGlobal = (str, dict) => {
    let open = []; // set of substitutions being actively explored
    let best = { value: [], weight: 0 }; // optimal path info

    // For each character in the input string, left to right
    for (let c of str) {
        // Add new nodes to `open` for all `substitutions` that
        // start with `c`
        for (let entry of dict)
            if (entry.match[0] === c)
                open.push({
                    value: best.value.concat(entry.sub),
                    rest: entry.match,
                    weight: best.weight
                });

        // Add current character onto best path
        best.value.push(c);
        ++best.weight;

        // For each `open` path, try to match against the current character
        let new_open = [];
        for (let o of open) {
            if (o.rest[0] === c) {
                if (o.rest.length > 1) { // still more to match
                    new_open.push({
                        rest: o.rest.slice(1),
                        value: o.value,
                        weight: o.weight
                    });
                } else { // full match found
                    if (o.weight < best.weight)
                        best = o;
                }
            }
        }
        open = new_open;
    }
    return best.value.join('');
};

将使用哪个：

replaceGlobal('abcde', [
    { match: 'a', sub: 'w' },
    { match: 'abc', sub: 'x' },
    { match: 'ab', sub: 'y' },
    { match: 'cde', sub: 'z' }
])) === 'yz'

它通过了一些简单的单元测试，但我可能会忽略一些愚蠢的东西，它似乎仍然比需要的更复杂。

您还可以使dict成为一系列字符，以便更轻松地查找匹配项（并对open执行相同操作）。即使有了特里，我相信这种方法仍然是O(str.length * dict.length)。

全球最佳字符串替换

2 个答案: