Question

假设我有一个这样的字符串：

<code>Blah blah Blah
enter code here</code>
<code class="lol">enter code here
fghfgh</code>

我想使用javascript来替换<code>标记之间的所有出现，并使用html对其进行编码的回调函数。

这就是我目前所拥有的：

function code_parsing(data){
    //Dont escape & because we need that... in case we deliberately write them in
    var escape_html = function(data, p1, p2, p3, p4) {
        return p1.replace(/</g, "&lt;").replace(/>/g, "&gt;").replace(/"/g, "&quot;").replace(/'/g, "&#039;");
    };

    data = data.replace(/<code[^>]*>([\s\S]*?)<\/code>/gm, escape_html);
        // \[start\](.*?)\[end\]
        return data;        
    };

遗憾的是，此功能会删除"<code>"个标记，并仅使用内容替换它们。我想保留<code>标签的任意数量的属性。如果我只是将<code>标签硬编码回来，我将失去属性。

我知道正则表达式不是最好的工具，但不会有任何嵌套元素。

Answer 1

您不应该使用正则表达式来解析HTML。

也就是说，您需要使用括号组捕获要保留的内容，并让替换者将其附加到您操作的位。

data.replace(/(<code[^>]*>)([\s\S]*?)(<\/code>)/g,
             function (_, startTag, body, endTag) {
               return startTag + escapeHtml(body) + endTag;
             })

要理解为什么不应该使用正则表达式来解析HTML，请考虑这样做

<code title="Shows how to tell whether x > y">if (x &gt; y) { ... }</code>

<code lang="js">node.style.color = "<code lang="css">#ff0000</code>"</code>

<code>foo</CODE >

<textarea><code>My HTML code goes here</code></textarea>

<code>foo  <!-- commented out </code> --></code>

Answer 2

简单的解决方案：在escape_html函数中，在对字符串进行操作之后，但在返回之前，将标记追加并添加到字符串中并返回完整的内容。

有时候最简单的答案是最好的答案：）

如何在不替换标签本身的情况下替换html标签之间的内容

2 个答案: