持有多少串

时间:2014-10-27 01:43:36

标签: c#

我正在努力做出一致意见。 我有一本字典,其中包含在文本中出现此单词的每个单词和频率。 现在我必须存储一些出现单词的行。 要做到这一点,我想制作一个存储每一行​​的容器。  像这样:

List<String> eachLine = new List<string>();
                using (var strReader = new StreamReader(@"pathToFile/Text.txt"))
                {
                    string line;
                    while ((line = strReader.ReadLine()) != null)
                    {

                        eachLine.Add(line);
                    }
                }

这是词典

 Dictionary<string, int> concordanceDictionary = new Dictionary<string, int>();

        string lines = File.ReadAllText(path:Text.txt").ToLower();

        string[] words = SplitWords(lines);

        foreach (var  word in words)
        {
            int i = 1;
            if (!concordanceDictionary.ContainsKey(word))
            {
                concordanceDictionary.Add(word, i);

            }
            else
            {
                concordanceDictionary[word]++;
            }

        }
        var list =concordanceDictionary.Keys.ToList();
        list.Sort();

要存储行数,我将创建一个“列表”,我将在其中放置单词的索引,使用方法包含字典中的每个单词,这将检查该单词是否在

   ' List<String> eachLine '   

问题是如何用行数列表显示每个单词? 也许你可以建议我更优雅,更容易的方法来做到这一点

3 个答案:

答案 0 :(得分:0)

我会使用Dictionary<String,List<Int32>>,其中密钥为String,这是当前单词,List<Int32>是单词出现的行号列表。要获取事件计数,只需取消引用列表的Count属性:dictionary[ word ].Count

顺便说一句,您不需要一次将所有内容都读入内存(如String[]个实例)。您只需逐个字符地读取文件并识别空格和换行符。

为此,这是我的实施:

void Run() {

    Dictionary< String, List<Int32> > dict = new Dictionary< String, List<Int32> >();

    foreach(Tuple<String,Int32> wordOccurrence in GetWords()) {
        String word = wordOccurrence.Item1;
        Int32 line = wordOccurrence.Item2;
        if( !dict.ContainsKey( word  ) ) dict.Add( word , new List<Int32>() );
        dict[ word ].Add( line );
    }

    foreach(String word in dict.Keys) {

        Console.WriteLine("\"{0}\" appeared {1} times, on these lines:", word, dict[word].Count);
        foreach(Int32 line in dict[word]) Console.WriteLine("\t{0}", line );
        Console.WriteLine("");
    }
}

IEnumerable< Tuple<String,Int32> > GetWords() {

    using(StreamReader rdr = new StreamReader("fileName")) {
        StringBuilder sb = new StringBuilder();
        Int32 nc; Char c;
        Itn32 lineNumber = 0;
        while( (nc = rdr.Read() != -1 ) {
            c = (Char)nc;

            if( Char.IsWhitespace(c) ) {
                if( sb.Length > 0 ) {
                    yield return new Tuple( sb.ToString(), lineNumber );
                    sb.Length = 0;
                }
                if( c == '\n' ) lineNumber++;
            } else {
                sb.Append( c );
            }
        }
        if( sb.Length > 0 ) yield return new Tuple( sb.ToString(), lineNumber );
    }
}            

答案 1 :(得分:0)

创建一个控制台应用程序供您运行

 class Program
    {
        static void Main(string[] args)
        {
            ReadTextToDictionary read = new ReadTextToDictionary();

            var strings = read.TextToListString(@"C:\stackoverflow\first.txt");

            var dictionarys = read.TextToDictionaryString(@"C:\stackoverflow\second.txt");
            foreach(var s in strings) {
                var compare = dictionarys.Where(a=>a.Value.Contains(s.ToString()));
                foreach(var f in compare)
                {
                    Console.WriteLine(s+" in line "+f.Key.ToString() + " " + f.Value);
                }
            }
            Console.ReadKey();
        }
    }

    class ReadTextToDictionary
    {
        public List<string> TextToListString(string path){

            var lines =  System.IO.File.ReadAllLines(path);

            return lines.ToList();
        }

        public Dictionary<int,string> TextToDictionaryString(string path)
        {
            Dictionary<int, string> dstr = new Dictionary<int, string>();
            var lines = System.IO.File.ReadAllLines(path);
            int count = 0;
            foreach (var s in lines)
            {
                count++;
                dstr.Add(count, s);
            }

            return dstr;
        }
    }

答案 2 :(得分:0)

一种方法是将单词出现在列表中的每一行存储,作为字典的值部分,并将单词作为键。

换句话说,你会有一个Dictionary<string, List<string>>,其中键是一个单词,相关的列表是包含单词的所有行。

通过这种方式,您可以快速访问这些行并获得免费的出现次数(dict[someWord].Count;

例如:

// words dictionary has a word key and a list of lines containing the word
var words = new Dictionary<string, List<string>>();

using (var strReader = new StreamReader(@"pathToFile/Text.txt"))
{
    string line;

    // Read each line
    while ((line = strReader.ReadLine()) != null)
    {
        // Get each word from the line
        var wordsInLine = line.ToLower().Split(' ');

        foreach (var word in wordsInLine)
        {
            // If this word already exists, update it's line number
            if (words.ContainsKey(word))
            {
                words[word].Add(line);
            }
                // Otherwise, add a new word with this line number to the list
            else
            {
                words.Add(word, new List<string> {line});
            }
        }
    }
}

如果你真的想要获得所有的行,你可以将它们添加到上面循环中的列表中,或者执行以下操作:

var allLines = words.SelectMany(w => w.Value).Distinct().ToList();