字符串线性搜索c#

时间:2013-02-25 06:30:09

标签: c# search text-files

我目前正在做一个小型C#练习,处理在文本文件中搜索相关术语/单词,程序将写出文本文件中包含搜索单词的所有句子。例如,我输入单词:“example”,程序将执行的操作是查看文本文件中的所有句子,并将那些带有“example”一词的句子拉出来。

The text file is structured as so: <sentenceDesignator> <text>
sentence 1: bla bla bla bla example of a sentence  //each line contains a sentence
sentence 2: this is not a good example of grammar
sentence 3: bla is not a real word, use better terms

我希望能够做的是使用线性搜索遍历文本文件中的所有行,并写出包含搜索字符串术语的所有句子。

到目前为止我的代码:

        String filename = @"sentences.txt";

        if (!File.Exists(filename))
        {
            // Since we just created the file, this shouldn't happen.
            Console.WriteLine("{0} not found", filename);
            return;
        }
        else
        {
            Console.WriteLine("Successfully found {0}.", filename);
        }
        //making a listof type "Sentence" to hold all the sentences
        List<Sentence> sentences = new List<Sentence>();

        //the next lines of code...
        StreamReader reader = File.OpenText(filename);

        //first, write out all of the sentences in the text file

        //read a line(sentence) from a line in the text file
        string line = reader.ReadLine();

        while (line != null)
        {
            Sentence s = new Sentence();

            //we need something to split data...
            string[] lineArray = line.Split(':');

            s.sentenceDesignator = lineArray[0];
            s.Text = lineArray[1];

            Console.Write("\n{0}", line);

            line = reader.ReadLine();
        }

        //so far, we can write out all of the sentences in the text file. 
        Console.Write("\n\nOK!, search a term to diplay all their occurences: ");
        string searchTerm = Console.ReadLine();

       if(!line.Contains(searchterm))
       {
          Console.Write("\nThat term does not exist in any sentence.");
       }
       else
        {
            foreach (Sentence ss in sentences)
            {
                if (ss.sentenceDesignator.Contains(queryName))
                {
                    //I need help here
                }
            }
        }

2 个答案:

答案 0 :(得分:1)

如果构建文件的索引然后搜索索引会快得多,就像使用线性搜索一样,每个搜索操作都是O(n),而使用索引搜索时,O(n)用于构建索引索引,但O(log n)near-O(1)用于查找(取决于您构建索引的方式)。成本是指数的内存消耗增加,但我会这样做:

private Dictionary<String,List<Int32>> _index = new Dictionary<String,List<Int32>>();

/// <summary>Populates an index of words in a text file. Takes O(n) where n is the size of the input text file.</summary>
public void BuildIndex(String fileName) {

    using(Stream inputTextFile = OpenFile(...)) {

        int currentPosition = 0;
        foreach(String word in GetWords(inputTextFile)) {

            word = word.ToUpperInvariant();
            if( !_index.ContainsKey( word ) ) _index.Add( word, new List<Int32>() );
            _index[word].Add( currentPosition );

            currentPosition = inputTextFile.Position;
        }
    }
}

/// <summary>Searches the text file (via its index) if the specified string (in its entirety) exists in the document. If so, it returns the position in the document where the string starts. Otherwise it returns -1. Lookup time is O(1) on the size of the input text file, and O(n) for the length of the query string.</summary>
public Int32 SearchIndex(String query) {

    String[] terms = query.Split(' ');

    Int32 startingPosition = -1;
    Int32 currentPosition = -1;
    Boolean first = true;
    foreach(String term in terms) {
        term = term.ToUpperInvariant();

        if( first ) {
            if( !_index.Contains( term ) ) return -1;
            startingPosition = _index[term][0];
        } else {

            if( !ContainsTerm( term, ++currentPosition ) ) return -1;
        }

        first = false;
    }

    return startingPosition;
}

/// <summary>Indicates if the specified term exists at the specified position.</summary>
private Boolean ContainsTerm(String term, Int32 expectedPosition) {

    if( !_index.ContainsKey(term) ) return false;
    List<Int32> positions = _index[term];
    foreach(Int32 pos in positions) {

        if( pos == expectedPosition ) return true;
    }
    return false;
}

OpenFileGetWords的实施应该是微不足道的。请注意GetWords使用yield return在文件中构建IEnumerable<String>个以空格分隔的单词,以及处理自定义文件格式。

答案 1 :(得分:0)

我对最后的if / else有点困惑。您似乎只是将文件的最后一行与searchterm进行比较。另外,“queryName”来自哪里?你想要打印出整个句子(“bla bla bla bla一句话的例子”)还是只打印“句子1”?另外,检查sentenceDesignator是否包含queryName,我想你想检查实际的Text是否包含searchterm。

也许这会对你有所帮助:

var lines = File.ReadAllLines(fileName);    
var sentences = new List<Sentence>(lines.Count());

foreach (var line in lines)
{
    var lineArray = line.Split(':');
    sentences.Add(new Sentence { sentenceDesignator = lineArray[0], Text = lineArray[1]});
}

foreach (var sentence in sentences)
{
    if (sentence.Text.Contains(searchTerm))
    {
        Console.WriteLine(sentence.sentenceDesignator);
        //Console.WriteLine(sentence.Text);
    }
}