如何将文本拆分成段落?

时间:2016-01-22 06:10:25

标签: c#

我需要将一个字符串拆分成段落并计算这些段落(由2个或更多空行分隔的段落)。 另外,我需要阅读文本中的每个单词,并且需要能够提及该单词所属的段落。

例如(每个段落多于一行,两个空行分隔段落):

This is
the first
paragraph


This is 
the second
paragraph


This is 
the third
paragraph

3 个答案:

答案 0 :(得分:0)

我认为您想要在段落中拆分文本,但是您是否有分隔符告诉您需要拆分字符串?例如,如果您想用“。”标识段落。这应该做的伎俩

string paragraphs="My first paragraph. Once upon a time";

string[] words = paragraphs.Split('.');

foreach (string word in words)
{
    Console.WriteLine(word);
}

结果将是:

My first paragraph
Once upon a time

请记住“。”角色被删除了!。

答案 1 :(得分:0)

这样的事情对你有用:

        var paragraphMarker = Environment.NewLine + Environment.NewLine;
        var paragraphs = fileText.Split(new[] {paragraphMarker},
                                        StringSplitOptions.RemoveEmptyEntries);
        foreach (var paragraph in paragraphs)
        {
            var words = paragraph.Split(new[] {' '}, 
                                  StringSplitOptions.RemoveEmptyEntries)
                                 .Select(w => w.Trim());
            //do something
        }

您可能需要更改行分隔符,文件可以有不同的变体,例如"\n""\r""\r\n"

您也可以在Trim函数中传递特定字符,以删除'.'',''!''"'等符号。

修改:为了增加灵活性,您可以使用正则表达式来分割段落:

        var paragraphs = Regex.Split(fileText, @"(\r\n?|\n){2}")
                              .Where(p => p.Any(char.IsLetterOrDigit));
        foreach (var paragraph in paragraphs)
        {
            var words = paragraph.Split(new[] {' '}, 
                                  StringSplitOptions.RemoveEmptyEntries)
                                 .Select(w => w.Trim());
            //do something
        }

答案 2 :(得分:0)

    public static List<string> SplitLine(string isstr, int size = 100)
    {
        var words = isstr.Split(new[] { ' ' },
                              StringSplitOptions.RemoveEmptyEntries);
        List<string> lo = new List<string>();
        string tmp = "";

        int i = 0;
        for (i = 0; i < words.Length; i++)
        {
            if ((tmp.Length + words[i].Length) > size)
            {
                lo.Add(tmp);
                tmp = "";
            }
            tmp += " " + words[i];
        }
        if (!String.IsNullOrWhiteSpace(tmp))
        {
            lo.Add(tmp);
        }

        return lo;
    }