正则表达式特定排除

时间:2018-01-08 18:04:32

标签: regex regex-lookarounds regex-group

我需要一个与单词的第一次出现匹配的正则表达式模式,不包含在'a'标记中,但可以包含在任何其他内容中标签。

即。否定前瞻以查看匹配单词是否在'标记内,如果是,请忽略并继续寻找有效匹配。

示例字符串

有效负载1:

<p>Sample 1 <a href="shouldNotMatchWrappedInA">wordToMatch</a> some random text 
to not be matched followed by wordToMatch, this should work.</p>

预期结果1:

wordToMatch ("Not the one inside of a' tags but the following one")

有效负载2:

<p>Sample 2 <a href="shouldNotMatchWrappedInA">wordToMatch</a> some random text 
to not be matched followed by <b>wordToMatch</b> this should work.</p>

预期结果2:

wordToMatch ("The one inside of the b' tags")

有效负载3:

<p>Sample 3 <a href="shouldNotMatchWrappedInA">wordToMatch</a> some 
random text to not be matched followed by wordToMatch followed by 
further occurrences of wordToMatch which should not be matched.</p>

预期结果3:

wordToMatch ("The second occurrence of the term")

请帮忙:'(

使用的语言是Java

1 个答案:

答案 0 :(得分:0)

我能想到的简单模式是:

(?:<a.*>)(\w+)(?:<\/a>)

为了测试,请运行perl脚本:

$result  = "<p>Sample 1 <a href=\"shouldNotMatchWrappedInA\">wordToMatch</a> some random text to not be matched followed by <b>wordToMatch</b>, this should work.</p>";

$result  =~  m/(?:<a.*>)(\w+)(?:<\/a>).*(\1).*/;

print $2; 

注意你需要使用第二个匹配的组。 不幸的是我不能在JAVA给你答案。