我需要将一个字符串拆分成段落并计算这些段落(由2个或更多空行分隔的段落)。 另外,我需要阅读文本中的每个单词,并且需要能够提及该单词所属的段落。
例如(每个段落多于一行,两个空行分隔段落):
This is
the first
paragraph
This is
the second
paragraph
This is
the third
paragraph
答案 0 :(得分:0)
我认为您想要在段落中拆分文本,但是您是否有分隔符告诉您需要拆分字符串?例如,如果您想用“。”标识段落。这应该做的伎俩
string paragraphs="My first paragraph. Once upon a time";
string[] words = paragraphs.Split('.');
foreach (string word in words)
{
Console.WriteLine(word);
}
结果将是:
My first paragraph
Once upon a time
请记住“。”角色被删除了!。
答案 1 :(得分:0)
这样的事情对你有用:
var paragraphMarker = Environment.NewLine + Environment.NewLine;
var paragraphs = fileText.Split(new[] {paragraphMarker},
StringSplitOptions.RemoveEmptyEntries);
foreach (var paragraph in paragraphs)
{
var words = paragraph.Split(new[] {' '},
StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Trim());
//do something
}
您可能需要更改行分隔符,文件可以有不同的变体,例如"\n"
,"\r"
,"\r\n"
。
您也可以在Trim
函数中传递特定字符,以删除'.'
,','
,'!'
,'"'
等符号。
修改:为了增加灵活性,您可以使用正则表达式来分割段落:
var paragraphs = Regex.Split(fileText, @"(\r\n?|\n){2}")
.Where(p => p.Any(char.IsLetterOrDigit));
foreach (var paragraph in paragraphs)
{
var words = paragraph.Split(new[] {' '},
StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Trim());
//do something
}
答案 2 :(得分:0)
public static List<string> SplitLine(string isstr, int size = 100)
{
var words = isstr.Split(new[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
List<string> lo = new List<string>();
string tmp = "";
int i = 0;
for (i = 0; i < words.Length; i++)
{
if ((tmp.Length + words[i].Length) > size)
{
lo.Add(tmp);
tmp = "";
}
tmp += " " + words[i];
}
if (!String.IsNullOrWhiteSpace(tmp))
{
lo.Add(tmp);
}
return lo;
}