我想从我的文本文件中删除停用词,并为此目的编写以下代码
TextWriter tw = new StreamWriter("D:\\output.txt");
private void button1_Click(object sender, EventArgs e)
{
StreamReader reader = new StreamReader("D:\\input1.txt");
string line;
while ((line = reader.ReadLine()) != null)
{
string[] parts = line.Split(' ');
string[] stopWord = new string[] { "is", "are", "am","could","will" };
foreach (string word in stopWord)
{
line = line.Replace(word, "");
tw.Write("+"+line);
}
tw.Write("\r\n");
}
但它没有在输出文件中显示结果,输出文件仍为空。
答案 0 :(得分:6)
正则表达式可能非常适合这项工作:
Regex replacer = new Regex("\b(?:is|are|am|could|will)\b");
using (TextWriter writer = new StreamWriter("C:\\output.txt"))
{
using (StreamReader reader = new StreamReader("C:\\input.txt"))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
replacer.Replace(line, "");
writer.WriteLine(line);
}
}
writer.Flush();
}
这种方法只会替换空白的单词,如果它们是另一个单词的一部分,则不对截止词做任何处理。
祝你好运。
答案 1 :(得分:2)
以下按预期方式工作。然而,这不是一个好方法,因为它会删除停用词,即使它们是较大词的一部分。此外,它不会清除被删除单词之间的额外空格。
string[] stopWord = new string[] { "is", "are", "am","could","will" };
TextWriter writer = new StreamWriter("C:\\output.txt");
StreamReader reader = new StreamReader("C:\\input.txt");
string line;
while ((line = reader.ReadLine()) != null)
{
foreach (string word in stopWord)
{
line = line.Replace(word, "");
}
writer.WriteLine(line);
}
reader.Close();
writer.Close();
此外,我建议您在创建流时使用using
语句,以确保及时关闭文件。
答案 2 :(得分:1)
您应该将IO对象包装在using语句中,以便正确处理它们。
using (TextWriter tw = new TextWrite("D:\\output.txt"))
{
using (StreamReader reader = new StreamReader("D:\\input1.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] parts = line.Split(' ');
string[] stopWord = new string[] { "is", "are", "am","could","will" };
foreach (string word in stopWord)
{
line = line.Replace(word, "");
tw.Write("+"+line);
}
}
}
}
答案 3 :(得分:0)
尝试在StreamWriter
子句中包装StreamReader
和using() {}
。
using (TextWriter tw = new StreamWriter(@"D:\output.txt")
{
...
}
您可能还想在最后致电tw.Flush()
。