在尝试将文本解析为句子时遇到了问题。 一切正常,文本的格式是这样的:(随机文本)
很多确实曾打电话给新的画作。限制期望她了解法律。 现在你有没有看到女人吵闹的比赛钱房。
程序将文本解析为3个句子。
但是只要句子中间有换行符,我的程序就会错误地分割文字。
很多确实曾打电话给新的画作。限制(她的新行)期待她的精神。 现在你有没有看到女人吵闹的比赛钱房。
程序将文本解析为4个句子。
我的代码:
public static void ReadData()
{
char[] sentenceSeparators = {'.', '!', '?'};
using (StreamReader reader = new StreamReader(dataFile))
{
string line = null;
while (null != (line = reader.ReadLine()))
{
var split = line.Split(sentenceSeparators, StringSplitOptions.RemoveEmptyEntries);
foreach (var i in split)
{
Console.WriteLine(i);
}
}
}
}
输入#1:
Much did had call new drew that kept. Limits expect wonder law she.
Now has you views woman noisy match money rooms.
输出#1:
Much did had call new drew that kept
Limits expect wonder law she
Now has you views woman noisy match money rooms
输入#2:
Much did had call new drew that kept. Limits expect
wonder law she.
Now has you views woman noisy match money rooms.
输出#2:
Much did had call new drew that kept
Limits expect
wonder law she
Now has you views woman noisy match money rooms
答案 0 :(得分:1)
因为您正在使用ReadLine
。请改用ReadToEnd
。
public static void ReadData()
{
char[] sentenceSeparators = {'.', '!', '?'};
using (StreamReader reader = new StreamReader(dataFile))
{
string line = reader.ReadToEnd();
var split = line.Split(sentenceSeparators, StringSplitOptions.RemoveEmptyEntries);
foreach (var i in split)
{
Console.WriteLine(i);
}
}
}
答案 1 :(得分:1)
如前所述,如果您希望\n
不影响您的分割,请不要逐行阅读。这是一个在1行中完成工作的版本:
string [] split = File.ReadAllText(dataFile).Split(sentenceSeparators, StringSplitOptions.RemoveEmptyEntries);
另外:控制台中的显示是虚幻的。因为它会显示"坏"在2行上的句子,但在split
数组中它将在一个位置上!
Console.WriteLine(split.Length); // will display 3