Question

我觉得我与这个非常接近，但是一旦我将标点符号捕获移到句子的末尾就会错过捕获。

句子情景如下：

This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it. This is a  sentence      with odd   spacing. This is one with lots of exclamation marks at the end!!!!This is another with a decimal 10.00 in the middle. Why is it so hard to find sentence endings?Last sentence without a space at the start.

这应该导致捕获：

This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it. 
This is a  sentence      with odd   spacing. 
This is one with lots of exclamation marks at the end!!!!
This is another with a decimal 10.00 in the middle. 
Why is it so hard to find sentence endings?
Last sentence without a space at the start.

这是我的表达方式：

.*?(?:[!?.;]+)((?<!(Mr|Mrs|Dr|Rev).?)(?=\D|\s+|$)(?:[^!?.;\d]|\d*\.?\d+)*)(?=(?:[!?.;]+))

目前有两个问题：

标点符号在开头
它正确地处理每个句子的一个名称而不是两个（奖励点我喜欢它以正确捕获＆＃34; DJ Smith先生＆＃34;但我无法弄清楚它是如何＆＃ 39; t匹配以单个字母结尾的句子。

进入此数据的数据会有所规范，所以我们知道它会以一个完整的句点结束并且在一条线上，但任何指针都欢迎。

Answer 1

我同意@spender建议使用解析器来过滤所有标点规则。

但是，以下内容适用于您的方案。

foreach (Match m in Regex.Matches(s, @"(.*?(?<!(?:\b[A-Z]|Mrs?|Dr|Rev|\d))[!?.;]+)\s*"))
         Console.WriteLine(m.Groups[1].Value);

<强>输出

This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it. 
This is a  sentence      with odd   spacing. 
This is one with lots of exclamation marks at the end!!!!
This is another with a decimal 10.00 in the middle. 
Why is it so hard to find sentence endings?
Last sentence without a space at the start.

Ideone Demo

正则表达式匹配句子与小数和名称

1 个答案: