从文本文件C#中提取特定的单词

时间:2017-01-05 17:18:01

标签: c# console-application

我目前正在构建一个使用streamreader浏览文本文件的方法。我想使用正则表达式或类似的东西来改变你现在将在下面看到的当前方法。

using (StreamReader fs = File.OpenText(FilePath))
    {

        int count = 0; //counts the number of times wordResponse is found.
        int lineNumber = 0;
        while (!fs.EndOfStream)
        {
            string line = fs.ReadLine();
            lineNumber++;
            int position = line.IndexOf(WordSearch);
            if (position != -1)
            {
                count++;
                Console.WriteLine("Match#{0} line {1}: {2}", count, lineNumber, line);
            }
        }

        if (count == 0)
        {
            Console.WriteLine("your word was not found!");
        }
        else
        {
            Console.WriteLine("Your word was found " + count + " times!");
        }
        Console.WriteLine("Press enter to quit.");
        Console.ReadKey();
    }

我从当前方法获得的输出是:

Match#1 line 3: Proin eleifend tortor velit, **True** quis aliquam arcu congue ut. Fusce sed mattis purus, sed vehicula diam. Nullam in leo sit amet massa pharetra semper et vel diam.
Match#2 line 7: lobortis nisl. Fusce dignissim ligula **True** a nunc maximus, vitae sollicitudin erat dictum. Vivamus commodo massa a tellus gravida posuere.
Match#3 line 17: **True** Sed pellentesque ipsum vel neque accumsan, quis fermentum augue pretium. Praesent fermentum risus nec ultricies sodales.
Match#4 line 24: Fusce nulla risus, ornare in eleifend id, **True** tincidunt eu sem. Donec enim sapien, rhoncus vitae ex lobortis, sagittis molestie libero.
Your word was found 4 times!
Press enter to quit.

正如你所看到的,我得到了整行代码,当我想要的只是每个句子中的一个单词时。它正在搜索的单词是 True

我相信这是字符串string line = fs.ReadLine();我必须操作一些额外的步骤来获得我想要的结果。

任何提示或指示都将不胜感激。

3 个答案:

答案 0 :(得分:1)

它就像.....一样简单。

Console.WriteLine("Match#{0} line {1}: {2}", count, lineNumber, WordSearch);

答案 1 :(得分:0)

你只需要在int position = ...

之后添加它
var word = line.SubString(position, Word.Length)

然后

Console.WriteLine("Match#{0} line {1}: {2}", count, lineNumber, word);

答案 2 :(得分:0)

  

我想使用正则表达式或类似的东西......

由于您提到要更改当前实现以使用正则表达式,我将提供此代码段:

var matches = Regex.Match(line, $".*({WordSearch})\\b.*", RegexOptions.IgnoreCase);
if (matches.Captures.Count > 0)
{
    count++;
    Console.WriteLine($"Match#{count} line {lineNumber}: {matches.Groups[1]}");
}        

RegexOption.IgnoreCase构造函数中的Match似乎合适,并在表达式中添加\b以限制部分匹配。