我觉得我与这个非常接近,但是一旦我将标点符号捕获移到句子的末尾就会错过捕获。
句子情景如下:
This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it. This is a sentence with odd spacing. This is one with lots of exclamation marks at the end!!!!This is another with a decimal 10.00 in the middle. Why is it so hard to find sentence endings?Last sentence without a space at the start.
这应该导致捕获:
This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it.
This is a sentence with odd spacing.
This is one with lots of exclamation marks at the end!!!!
This is another with a decimal 10.00 in the middle.
Why is it so hard to find sentence endings?
Last sentence without a space at the start.
这是我的表达方式:
.*?(?:[!?.;]+)((?<!(Mr|Mrs|Dr|Rev).?)(?=\D|\s+|$)(?:[^!?.;\d]|\d*\.?\d+)*)(?=(?:[!?.;]+))
目前有两个问题:
进入此数据的数据会有所规范,所以我们知道它会以一个完整的句点结束并且在一条线上,但任何指针都欢迎。
答案 0 :(得分:0)
我同意@spender建议使用解析器来过滤所有标点规则。
但是,以下内容适用于您的方案。
foreach (Match m in Regex.Matches(s, @"(.*?(?<!(?:\b[A-Z]|Mrs?|Dr|Rev|\d))[!?.;]+)\s*"))
Console.WriteLine(m.Groups[1].Value);
<强>输出强>
This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it.
This is a sentence with odd spacing.
This is one with lots of exclamation marks at the end!!!!
This is another with a decimal 10.00 in the middle.
Why is it so hard to find sentence endings?
Last sentence without a space at the start.