我正在尝试阅读并使用文件中的文本。问题是我需要把它分成句子而不能想办法做到这一点......
以下是文本文件的示例:
I went to a shop. I bought a pack of sausages
and some milk. Sadly I forgot about the potatoes. I'm on my way
to the store
to buy potatoes.
正如你所看到的,句子在结束之前可以跨越多行。我知道我应该使用正则表达式,但想不出办法...
答案 0 :(得分:0)
假设您将句子定义为由句点分隔的任何非空输入部分。
也许就是这样:
(?<=^|\.)(.+?)(\.|$)
关键可能是您应该使用RegexOptions.Singleline
选项,以便.
匹配任何字符(而不是除\ n之外的任何字符)。
更详细地解释上述模式:
(?<=^|\.)
是一个Zero-Width Positive Lookbehind Assertion,要求您的匹配位于输入的开头或者以句点开头。匹配期间本身不会成为比赛的一部分。(.+?)
是您的句子内容。 +?
运算符被称为lazy,因为它将尝试匹配尽可能短的输入部分。这需要确保它不会抓住下一个模式部分的句号或下一个句子(\.|$)
将匹配句子终结符或输入结束。 完整的工作示例:
Regex r = new Regex(@"(?<=^|\.)(.+?)(\.|$)", RegexOptions.Singleline);
String input = @"I went to a shop. I bought a pack of sausages
and some milk. Sadly I forgot about the potatoes. I'm on my way
to the store
to buy potatoes.";
foreach (var match in r.Matches(input))
{
string sentence = match.ToString();
}
答案 1 :(得分:0)
我尝试将单独的行添加到一个实心字符串中,然后将其拆分成几个句子。
这是我尝试使用的方法:
range
告诉我有更好的方法来做到这一点。
答案 2 :(得分:0)
正如@maccettura评论你可以尝试类似的东西。
string text = "...";
text = text.Replace(System.Environment.NewLine, " ").Replace(" ", " ");
var sentences = text.Split(new char[] { '.', '!', '?' });
foreach(string s in sentences)
{
Console.WriteLine(s);
}
答案 3 :(得分:0)
我不知道你的文字有多长,所以万一我会一句一句地做。
这样的事情:
char[] periods = {'.', '!', '?'}; // or any other separator you may like
string line = "";
string sentence = "";
using (StreamReader reader = new StreamReader ("filename.txt"))
{
while ((line = reader.ReadLine()) != null)
{
if (line.IndexOfAny(periods)<0)
{
sentence += " " + line.Trim(); // increment sentence if there are no periods
// do whatever you want with the sentence
if (string.IsNullOrEmpty (sentence))
process(sentence);
continue;
}
// I'm using StringSplitOptions.None here so we handle lines ending with a period right
string[] sentences = line.Split(periods, StringSplitOptions.None);
for (int i = 0; i < sentences.Length; i++)
{
sentence += " " + line.Trim(); // increment sentence if there are no periods
// do whatever you want with the sentence
if (string.IsNullOrEmpty(sentence))
process(sentence);
// we don't want to clean on the last piece of sentence as it will continue on the next line
if (i < sentences.Length - 1)
{
sentence = ""; // clean for next sentence
}
}
}
// this step is only required if you might have the last line sentence ending without a period
// do whatever you want with the sentence
if (string.IsNullOrEmpty(sentence))
process(sentence);
(请注意,如果您知道自己只处理小型文件,则不需要所有这些,并且您可以使用之前的建议。)