Question

可能重复：
RegEx match open tags except XHTML self-contained tags

我有一个类似html的字符串

<html>
  <div>
      <p>this is sample content</p>
  </div>
  <div>
      <p>this is another sample</p>
      <span class="test">this sample should not caught</span>
      <div>
       this is another sample
      </div>
  </div>
</html>

现在我想从这个字符串中搜索单词sample，这里我不应该得到＆＃34;样本＆＃34;它位于<span>...</span>

内

我希望使用正则表达式来完成，我尝试了很多，但是我无法做到，任何帮助都很棒。

提前致谢。

Answer 1

这很脆弱，如果可以嵌套span标签，则会失败。如果您没有，请尝试

(?s)sample(?!(?:(?!</?span).)*</span>)

仅当下一个sample标记（如果有）不是结束标记时，才匹配span。

<强>解释

(?s)          # Switch on dot-matches-all mode
sample        # Match "sample".
(?!           # only if it's not followed by the following regex:
 (?:          #  Match...
  (?!</?span) #   (unless we're at the start of a span tag)
  .           #   any character
 )*           #  any number of times.
 </span>      #  Match a closing span tag.
)             # End of lookahead

仅当sample不在span或p范围内时才匹配(?s)sample(?!(?:(?!</?span).)*</span>)(?!(?:(?!</?p).)*</p>)，您可以使用

但所有这些完全取决于标签被取消（即，没有两个相同类型的标签可以嵌套）并且正确平衡（通常不会使用{{1}}标签）。

如何在正则表达式搜索时从标记<span class =“”> </span>中跳过内容？

1 个答案: