Question

I have some HTML and I need to match a phrase "My Phrase" that is not inside an <a> tag.

Phrases that should NOT match:

1. <a>My Phrase</a>
2. <a><strong>My Phrase</strong></a>

Phrases that SHOULD match:

3. <strong>My Phrase</strong>
4. My Phrase

My current solution uses negative lookahead to find matches that aren't followed by a closing </a> tag:

My Phrase(?![^<]*>|[^<>]*<\/a)

https://regex101.com/r/n1d9KZ/1

内的短语

正如您在示例中所看到的，它适用于常规文本链接（案例1），但是当嵌套在“a”标记内的其他标记时，案例2会中断。

有没有人有一个负面的前瞻性正则表达式适用于两者？

我无法使用(?<!<a.*?>.*?)My Phrase(?!.*?<\/a>)这样的正则表达式使用负向lookbehind，因为我收到了错误java.util.regex.PatternSyntaxException: Look-behind group does not have an obvious maximum length。我也不想解析HTML并删除所有当前的“a”标签，因为我需要保持HTML完整并将“My Phrase”替换为“Another Phrase”。

Answer 1

你要做的事情并非如此微不足道，因为实际上不可能（只有Jeff Dean可以）用RegEx完全处理HTML。

因为到处都可能有新的行，具有复杂的属性和嵌套或只是无效。

无论如何，在您的示例的情况下（没有href，标签内部和内部没有新行），您可以执行以下操作：

result = text.replace(/^.*?(My Phrase).*?$/gm, function($0,$1) { 
    var regEx = new RegExp("(" + $1 + ")");
    return $0.indexOf('<a') >= 0 ? $0 : $0.replace(regEx, '<b>$1</b>');
});

我只是在示例中加粗了匹配，但你可以在回调中做很多事情：https://jsfiddle.net/8Ls0qbvj/

正则表达式匹配不在<a> tag

1 个答案: