如何在HTML标签之外与RegEx匹配

时间:2012-09-19 20:29:41

标签: html regex replace tags

  

可能重复:
  RegEx match open tags except XHTML self-contained tags

如何匹配HTML标签之外的一些字母数字单词,而不是匹配每个单词

示例:

<div id="mariano mariano mariano" nota="mariano/mariano">mariano was looking forward Mariano. I want to match this "Mariano" too. Mariano</div>

在这个例子中,我希望匹配所有&#34; Mariano&#34;在标签ID之外。

我认为这个问题的关键是期待&#34;&lt;&#34;在&#34;&gt;&#34;之前并匹配该单词,但如果正则表达式找到&#34;&gt;&#34;在&#34;&lt;&#34;之前这意味着这个词在标签中, 但我无法为此实现/制作正则表达式。

我没有尝试将此正则表达式(?<=^|>)[^><]+?(?=<|$)与另一个进行联接。 我最终质量最低的解决方案是:

<!-- language: lang-js -->
var searchFor = new RegExp("((!?<=^|>)" + termino + ")","ig");
var searchFor2 = new RegExp("(" + termino + "(?=<|$))","ig");
var searchFor3 = new RegExp("(!?<=^|[\\s\\.;,])" + termino + "(?=[\\s\\.;,]|$)","ig");

但这3个人并没有涵盖所有选择。

编辑:我正在使用javascript:

<script>
container.find("p, span, div, .texto,").each(function() {
var containerText = $(this).html();
for (var i = 0; i < terms.length; i++) {
    var termino = terms[i];
    // 1st issue ">termino" was remplaced for: ">Pedro"
    var searchFor = new RegExp("((!?<=^|>)" + termino + ")","ig");
    containerText = containerText.replace(searchFor,">Pedroedro");
    // 2nd issue "termino<" was remplaced for: "Pedro"
    var searchFor2 = new RegExp("(" + termino + "(?=<|$))","ig");
    containerText = containerText.replace(searchFor2,"Pedro");
    // 3rd issue "[\.\s,;:]termino[\.\s,;:]
    var searchFor3 = new RegExp("(!?<=^|[\\s\\.;,])" + termino + "(?=[\\s        \\.;,]|$)","ig");
    containerText = containerText.replace(searchFor3," Pedro");
};
$(this).html(containerText);
}); 
</script>

1 个答案:

答案 0 :(得分:1)

一些事情 -

  1. 欢迎来到stackoverflow!
  2. 请在询问前搜索问题。解析有很多结果 xml with regex。
  3. 不要使用正则表达式来解析xml / html! Try xpath

    var termino = // how ever you were defining before...
    
    // Give me all divs, where the text content contains value of "termino"
    var iterator = document.evaluate('//div/text()[contains(.,' + termino + ')]', documentNode, null, XPathResult.UNORDERED_NODE_ITERATOR_TYPE, null );
    
    try {
      // init thisNode to the first item in the iterator
      var thisNode = iterator.iterateNext();
    
      // go through all items, alert their content (which should contain termino)
      while (thisNode) {
        alert( thisNode.textContent );
        thisNode = iterator.iterateNext();
      } 
    }
    catch (e) {
       dump( 'Error: Document tree modified during iteration ' + e );
    }