Question

我正在使用JavaScript RegExp对HTML内容进行搜索突出显示。

为此我正在使用：

data.replace( new RegExp("("+search+")", 'g'), "<b id='searchHighlight'>$1</b>" );

其中data是整个HTML内容，search是搜索字符串。

在搜索时，例如h，它会突出显示单词中的h（其中，等等...）以及"<h1 id="title"> Something </h1>"等标记中的实例。

我无法采用其他方法，因为我需要使用相同的样式突出显示相同的HTML内容。

我已阅读过如下解决方案：

var input = "a dog <span class='something'> had a  </span> and a cat";
// Remove anything tag-like
var temp = input.replace(/<.+?>/g, "");
// Perform the search
var matches = new RegExp(exp, "g").exec(temp);

但是，由于我需要在同一HTML内容中突出显示搜索文本，因此我无法简单地删除现有标记。有没有办法在RegExp中执行包含和排除搜索，以便我可以在＆＃34;＆＃34; 和h中突出显示"t<b id='searchHighlight'>h</b>e"
并且不允许"<h1 id="title">Test</h1>"被破坏："<<b id='searchHighlight'>h</b>1 id="title">Test</<b id='searchHighlight'>h</b>1>"？

HTML内容是静态的，如下所示：

    <h1 id="title">Samples</h1>
        <div id="content">
            <div  class="principle">
        <h2 id="heading">           
            PRINCIPLE</h2>


        <p>
            FDA recognizes that samples are an important part of ensuring that the right drugs are provided to the right patients. Under the Prescription Drug Marketing Act (PDMA), a sales representative is permitted to provide prescription drug samples to eligible healthcare professionals (HCPs). In order for BMS to provide this service, representatives must strictly abide by all applicable compliance standards pertaining to the distribution of samples.</p></div>
<h2 id="heading">           
            WHY DOES IT MATTER?</h2>
        <p>
            The Office of Inspector General (OIG) recognizes that samples can have monetary value to HCPs and, when used improperly, may have implications under the Federal False Claims Act and the Federal Anti-kickback Act. To minimize risk of such liability, the OIG requires the clear and conspicuous labeling of individual samples as units that cannot be sold.&nbsp; BMS and its business partners label every sample package to meet this requirement.&nbsp; Additionally, the HCP signature statement acknowledges that the samples will not be sold, billed or provided to family members or friends.</p>
        <h2 id="heading">

            WHO IS YOUR SMaRT PARTNER?</h2>
        <p>
            SMaRT is an acronym for &ldquo;Samples Management and Representatives Together&rdquo;.&nbsp; A SMaRT Partner has a thorough understanding of BMS sample requirements and is available to assist the field with any day-to-day policy or procedure questions related to sample activity. A SMaRT Partner will also:</p>

        <ul>
            <li style="margin-left:22pt;"> Monitor your adherence to BMS&rsquo;s sample requirements.</li>
            <li style="margin-left:22pt;"> Act as a conduit for sharing sample compliance issues and best practices.</li>
            <li style="margin-left:22pt;"> Respond to day-to-day sample accountability questions within two business days of receipt.</li>
        </ul>
        <p>

            Your SMaRT Partner can be reached at 888-475-2328, Option 3.</p>
        <h2 id="heading">

            BMS SAMPLE ACCOUNTABILITY POLICIES &amp; PROCEDURES</h2>
        <p>
            It is the responsibility of each sales representative to read, understand and follow the BMS Field Sample Accountability Procedures, USPSM-SOP-101. The basic expectations are:</p>
        <ul>
            <li style="margin-left:22pt;"> Transmit all sample activity by communicating your tablet to the host server on a <strong>daily</strong> basis.</li>
            <li style="margin-left:22pt;"> Maintain a four to six week inventory of samples rather than excessive, larger inventories that are more difficult to manage and increase your risk of non-compliance.</li>
            <li style="margin-left:22pt;"> Witness all HCP&rsquo;s signatures to confirm request and receipt of samples.</li>
        </ul>
</div>

内容全部分散，而不只是一个标签。所以DOM操作对我来说不是解决方案。

Answer 1

如果您确定标记的属性中没有<或>，则可以使用

data = data.replace( 
    new RegExp( "(" + search + "(?![^<>]*>))", 'g' ),
        "<b id='searchHighlight'>$1</b>" );

如果(?![^<>]*>)出现在字符串前面>之前，那么否定前瞻<会阻止替换，就像在标记内部一样。

这远非万无一失，但它可能已经足够好了。

BTW，因为您在全球范围内进行匹配，即进行多次替换，id='searchHighlight'应该是class='searchHighlight'。

您需要注意search不包含任何正则表达式特殊字符。

Answer 2

你可能已经意识到你试图使用错误的工具来完成工作，所以这只是为了记录（如果你不是，你可以find this insightful）。

你可能（当然会？）在html属性上遇到一个基本问题，基本上是任意文本内容，即title（工具提示属性）和data-...（通用的用户定义属性，可以保持任意数据设计） - 无论你在HTML代码的文本部分找到什么，你也可以在那里找到，替换它将破坏气球帮助和/或破坏一些应用程序逻辑。还要注意，文本内容的任何字符都可以编码为命名或数字实体（例如& - ＆gt; &，&，&），可以处理原则上但会使动态正则表达式复杂化（如果你的变量search将保持直接文本）。

说完这一切之后，你可能与data.replace( new RegExp("([>]?)[^><]*("+search+")[^><]*([<]?)", 'g'), "<b id='searchHighlight'>$1$2$3</b>" );相处，除非要突出显示的搜索结果可能包含在regex规范中具有语义的字符，例如.+*|([{}])\，或许{ {1}};这些你必须正确逃脱。

总结：修改您的设计，以免您遇到麻烦。

顺便问一下，你为什么不选择dom遍历？你不需要知道存在的实际html标签。

Answer 3

这不是一个纯粹的RegExp解决方案，但是，如果你不能遍历DOM，那么带有功能替换和循环的字符串操作可能适合你。

声明您需要的变量并获取文档正文的innerHTML。
查看提取任何标签的数据，并将其保存在数组中。留下一个占位符，以便您知道以后将它们放回去的位置。
将所有标记替换为字符串中的临时占位符后，您可以使用原始代码替换所需的字符，但将结果分配回data。
然后您需要通过撤消之前的过程来恢复标记。
将新的data指定为文档正文的innerHTML。

这是process in action。

以下是代码：

var data = document.body.innerHTML, // get the DOM as a string
    tagarray = [], // a place to temporarily store all your tags
    tagmatch = /<[^>]+>/g, // for matching tags
    tagplaceholder = '<>', // could be anything but should not match the RegExp above, and not be the same as the search string below
    search = 'h'; // for example; but this could be set dynamically

while (tagmatch.test(data)) {
    data = data.replace(tagmatch, function (str) {
        tagarray.push(str); // store each matched tag in your array
        return tagplaceholder; // whatever your placeholder should be
    });
}

data = data.replace( new RegExp("("+search+")", 'g'), "<b id='searchHighlight'>$1</b>" ); // now search and replace the string of your choice

while (new RegExp(tagplaceholder, 'g').test(data)) {
    data = data.replace(tagplaceholder, function (str) {
        return tagarray.shift(str); // replace the placeholders with the tags you saved earlier to restore them
    });
}

document.body.innerHTML = data; // assign the changed `data` string to the body

显然，如果你能把这一切都放在自己的功能中，那就更好了，因为你并不真的想要像上面这样的全局变量。

JavaScript正则表达式排除+包含模式匹配

3 个答案: