我有一个有趣的正则表达式问题。说我有这样的段落
Johannesburg (; Afrikaans: ; also known as Jozi, Jo'burg, and eGoli) is the largest city in South Africa and one of the 50 largest urban areas in the world. It is the provincial capital and largest city of Gauteng, which is the wealthiest province in South Africa. While Johannesburg is not one of South Africa's three capital cities, it is the seat of the Constitutional Court. The city is located in the mineral-rich Witwatersrand range of hills and is the centre of large-scale gold and diamond trade.
此正则表达式(^.*?[a-z]{2,}[.!?])\s+\W*[A-Z]
在基于句子构造逻辑查找第一句时效果很好。当我只有一个这样的句子时,问题就来了
Johannesburg (; Afrikaans: ; also known as Jozi, Jo'burg, and eGoli) is the largest city in South Africa and one of the 50 largest urban areas in the world.
这与该句子不匹配,这是可以理解的,因为在此之后没有其他句子开始。我的问题是现在如何调整此表达式,使其适用于两种情况?
答案 0 :(得分:2)
您可以使用alternation (^.*?[a-z]{2,}[.!?])(?:\s+\W*[A-Z]|$)
来匹配所需的逻辑或声明字符串$
的结尾。
(^.*?[a-z]{2,}[.!?])(?=\s+\W*[A-Z]|$)
如果一开始不需要捕获组()
,则也可以忽略它,而使用正数lookahead (?=
仅获得匹配项:
^.*?[a-z]{2,}[.!?](?=\s+\W*[A-Z]|$)