Question

我正在尝试编写一个高亮插件，并希望保留HTML格式。可以忽略＆lt;之间的所有字符吗？和＆gt;在使用javascript进行替换时在字符串中？

以下面的例子为例：

var string = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";

我希望能够实现以下目标（将'dolor'替换为'FOO'）：

var string = "Lorem ipsum FOO span sit amet, consectetuer <span class='dolor'>FOO</span> adipiscing elit.";

或者甚至可能（将'span'替换为'BAR'）：

var string = "Lorem ipsum dolor BAR sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";

我非常接近tambler在这里找到答案：Can you ignore HTML in a string while doing a Replace with jQuery?但是，出于某种原因，我无法得到公认的答案。

我对正则表达式完全陌生，所以感谢任何帮助。

Answer 1

使用浏览器的内置解析器通过innerHTML解析HTML，然后进行DOM遍历是实现此目的的明智方法。这是一个基于this answer松散的答案：

现场演示：http://jsfiddle.net/FwGuq/1/

代码：

// Reusable generic function
function traverseElement(el, regex, textReplacerFunc) {
    // script and style elements are left alone
    if (!/^(script|style)$/.test(el.tagName)) {
        var child = el.lastChild;
        while (child) {
            if (child.nodeType == 1) {
                traverseElement(child, regex, textReplacerFunc);
            } else if (child.nodeType == 3) {
                textReplacerFunc(child, regex);
            }
            child = child.previousSibling;
        }
    }
}

// This function does the replacing for every matched piece of text
// and can be customized to do what you like
function textReplacerFunc(textNode, regex, text) {
    textNode.data = textNode.data.replace(regex, "FOO");
}

// The main function
function replaceWords(html, words) {
    var container = document.createElement("div");
    container.innerHTML = html;

    // Replace the words one at a time to ensure each one gets matched
    for (var i = 0, len = words.length; i < len; ++i) {
        traverseElement(container, new RegExp(words[i], "g"), textReplacerFunc);
    }
    return container.innerHTML;
}


var html = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";
alert( replaceWords(html, ["dolor"]) );

Answer 2

此解决方案适用于perl，并且还应该与Javascript一起使用，因为它与ECMA 262兼容：

s,\bdolor\b(?=[^"'][^>]*>),FOO,g

基本上，如果单词后面跟不是引用的所有内容，则后面跟着不是结束>和结束>本身的所有内容。

替换字符串中的单词，但忽略HTML

2 个答案: