Question

我正在尝试匹配从开头到第二个点的所有文本，排除html标记中包含的点。

如果以下正则表达式/^([^\.]*[\.]){0,2}/不是HTML标记，则它可以正常工作，因为它选择从开始到第二个点的所有内容。

然而，当我有这个：

<p><img src="example.image.com" alt="foo">Text. More text.</p>

我希望我的正则表达式在文本的第2次出现处停止，而不是在“图像”和“com”之间的点处停止。

我也知道\.(?![^><]*>)将选择html标签之外的所有点，但我真的很挣扎，我真的很感谢你的帮助！

Answer 1

试试这个正则表达式：

(?:(?:(?:<[^>]+>)*[^<.]*)*\.){2}

(?:                  # start of non-capturing group
    (?:              # start of non-capturing group
        (?:          # start of non-capturing group
            <[^>]+>  # matches an HTML tag
        )*           # match any more tags
        [^<.]*       # matches a sequence of non-tag, non-dot characters
    )*               # match any more tags and non-dot characters
    \.               # match a dot
){2}                 # repeat all of the above again

详细解释和演示here。

正则表达式匹配文本直到第二个点，html标签排除

1 个答案: