将句子拆分成多行的字符串

时间:2017-11-21 20:39:40

标签: c# regex string char

我正在尝试阅读并使用文件中的文本。问题是我需要把它分成句子而不能想办法做到这一点......

以下是文本文件的示例:

I went to a shop. I bought a pack of sausages 

and some milk. Sadly I forgot about the potatoes. I'm on my way 

to the store

to buy potatoes.

正如你所看到的,句子在结束之前可以跨越多行。我知道我应该使用正则表达式,但想不出办法...

4 个答案:

答案 0 :(得分:0)

假设您将句子定义为由句点分隔的任何非空输入部分。

也许就是这样:

(?<=^|\.)(.+?)(\.|$)

关键可能是您应该使用RegexOptions.Singleline选项,以便.匹配任何字符(而不是除\ n之外的任何字符)。

更详细地解释上述模式:

  1. (?<=^|\.)是一个Zero-Width Positive Lookbehind Assertion,要求您的匹配位于输入的开头或者以句点开头。匹配期间本身不会成为比赛的一部分。
  2. (.+?)是您的句子内容。 +?运算符被称为lazy,因为它将尝试匹配尽可能短的输入部分。这需要确保它不会抓住下一个模式部分的句号或下一个句子
  3. (\.|$)将匹配句子终结符或输入结束。
  4. 完整的工作示例:

    Regex r = new Regex(@"(?<=^|\.)(.+?)(\.|$)", RegexOptions.Singleline);
    String input = @"I went to a shop. I bought a pack of sausages
    and some milk. Sadly I forgot about the potatoes. I'm on my way
    to the store
    to buy potatoes.";
    foreach (var match in r.Matches(input))
    {
        string sentence = match.ToString();
    }
    

答案 1 :(得分:0)

我尝试将单独的行添加到一个实心字符串中,然后将其拆分成几个句子。

这是我尝试使用的方法:

range

告诉我有更好的方法来做到这一点。

答案 2 :(得分:0)

正如@maccettura评论你可以尝试类似的东西。

string text = "...";
text = text.Replace(System.Environment.NewLine, " ").Replace("  ", " ");
        var sentences = text.Split(new char[] { '.', '!', '?' });
        foreach(string s in sentences)
        {
            Console.WriteLine(s);
        }

答案 3 :(得分:0)

我不知道你的文字有多长,所以万一我会一句一句地做。

这样的事情:

        char[] periods = {'.', '!', '?'}; // or any other separator you may like

        string      line       = "";
        string      sentence   = "";

        using (StreamReader reader = new StreamReader ("filename.txt"))
        {
            while ((line = reader.ReadLine()) != null)
            {
                if (line.IndexOfAny(periods)<0)
                {
                    sentence += " " + line.Trim(); // increment sentence if there are no periods

                    // do whatever you want with the sentence
                    if (string.IsNullOrEmpty (sentence))
                        process(sentence);

                    continue;
                }

                // I'm using StringSplitOptions.None here so we handle lines ending with a period right
                string[] sentences = line.Split(periods, StringSplitOptions.None);

                for (int i = 0; i < sentences.Length; i++)
                {
                    sentence += " " + line.Trim(); // increment sentence if there are no periods

                    // do whatever you want with the sentence
                    if (string.IsNullOrEmpty(sentence))
                        process(sentence);

                    // we don't want to clean on the last piece of sentence as it will continue on the next line
                    if (i < sentences.Length - 1)
                    {
                        sentence = ""; // clean for next sentence
                    }
                }

            }

            // this step is only required if you might have the last line sentence ending without a period
            // do whatever you want with the sentence
            if (string.IsNullOrEmpty(sentence))
                process(sentence);

(请注意,如果您知道自己只处理小型文件,则不需要所有这些,并且您可以使用之前的建议。)