这是我到目前为止所做的。问题是如果连词在句子中出现两次,则代码对于连词的第二次出现不起作用。如果有专家可以提供帮助吗?
private void SplitSentence_Click(object sender, EventArgs e)
{
richTextBox2.Text = "";
richTextBox3.Text = "";
string[] keywords = { " or ", " and ", " hence", "so that", "however", " because" };
string[] sentences = SentenceTokenizer(richTextBox1.Text);
string remSentence;
foreach (string sentence in sentences)
{
remSentence = sentence;
richTextBox3.Text = remSentence;
for (int i =0; i < keywords.Length; i++)
{
if ((remSentence.Contains(keywords[i])))// || (remSentence.IndexOf(keywords[i]) > 0))
{
richTextBox2.Text += remSentence.Substring(0, remSentence.IndexOf(keywords[i])) + '\n' + keywords[i] + '\n';
remSentence = remSentence.Substring(remSentence.IndexOf(keywords[i]) + keywords[i].Length);
}
}
richTextBox2.Text += remSentence;
}
}
public static string[] SentenceTokenizer(string text)
{
char[] sentdelimiters = new char[] { '.', '?', '۔', '؟', '\r', ':', '-' }; // '{ ',' }', '( ', ' )', ' [', ']', '>', '<','-', '_', '= ', '+','|', '\\', ':', ';', ' ', '\'', ',', '.', '/', '?', '~', '!','@', '#', '$', '%', '^', '&', '*', ' ', '\r', '\n', '\t'};
// text.Remove('\n');
return text.Split(sentdelimiters, StringSplitOptions.RemoveEmptyEntries);
}
答案 0 :(得分:1)
您可以使用正则表达式来处理此问题,而不是手动执行操作。我会在我的例子中使用英语,这样我就不会意外地屠杀可怜的乌尔都语。
using System.Text.RegularExpressions;
Regex r = new Regex("\b(and|or|hence)");
sentence = r.Replace(sentence, "|"); // Just something unlikely to be normal.
string[] phrases = sentence.Split ('|'); // Each piece between conjunctions.
您可能需要调整大小写(?)以及结合可能是另一个单词的一部分的可能性(我使用了前导空格 - 或来自@Drahcir建议的单词边界 - 来启动该过程)。有关使用.NET版本的反向引用,请参阅this answer。