C#Dictionary允许看似相同的键

时间:2015-05-08 01:48:04

标签: c# dictionary

我创建了一个字典,并创建了读取txt文件的代码,并将文件中的每个单词输入到字典中。

file_client: local

id: awesome

file_roots:
  base:
    - /srv/salt/salt

pillar_roots:
  base:
    - /srv/salt/pillar

//Set up OpenFileDialog box, and prompt user to select file to open DialogResult openFileResult; OpenFileDialog file = new OpenFileDialog() ; file.Filter = "txt files (*.txt)|*.txt"; openFileResult = file.ShowDialog(); if (openFileResult == DialogResult.OK) { //If user selected file successfully opened //Reset form this.Controls.Clear(); this.InitializeComponent(); //Read from file, split into array of words Stream fs = file.OpenFile(); StreamReader reader; reader = new StreamReader(fs); string line = reader.ReadToEnd(); string[] words = line.Split(' ', '\n'); //Add each word and frequency to dictionary foreach (string s in words) { AddToDictionary(s); } //Reset variables, and set-up chart ResetVariables(); ChartInitialize(); foreach (string s in wordDictionary.Keys) { //Calculate statistics from dictionary ComputeStatistics(s); if (dpCount < 50) { AddToGraph(s); } } //Print statistics PrintStatistics(); } 函数是:

AddToDictionary(s)

此程序正在读取的文本文件是:

public void AddToDictionary(string s)
    {
        //Function to add string to dictionary
        string wordLower = s.ToLower();
        if (wordDictionary.ContainsKey(wordLower))
        {
            int wordCount = wordDictionary[wordLower];
            wordDictionary[wordLower] = wordDictionary[wordLower] + 1;
        }
        else
        {
            wordDictionary.Add(wordLower, 1);
            txtUnique.Text += wordLower + ", ";
        }
    }

我遇到的问题是“have”这个词在字典中出现了两次。我知道不会发生在字典中,但出于某种原因,它出现了两次。有谁知道为什么会这样?

3 个答案:

答案 0 :(得分:3)

如果你跑:

var sb = new StringBuilder();
sb.AppendLine("test which");
sb.AppendLine("is a test");
var words = sb.ToString().Split(' ', '\n').Distinct();

在调试器中检查words表明,由于两个字节的CRLF行终止符,某些“test”实例已获得\r - 这不是分割处理的。

要修复此问题,请将拆分更改为:

Split(new[] {" ", Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)

答案 1 :(得分:0)

如果要支持多种语言,将文本拆分为单词通常很难解决。正则表达式在处理解析时通常比基本String.Split更好。

即。在你的情况下,你正在挑选&#34;新线&#34;作为一个单词的一部分,你也可以选择像不间断的空间......,

以下代码会选择比当前.Split更好的字词,以获取更多信息 - How do I split a phrase into words using Regex in C#

 var words = Regex.Split(line, @"\W+").ToList();

此外,你应该确保你的词典不区分大小写(根据你的需要选择比较器,也有文化意识):

 var dictionary = new Dictionary(StringComparer.OrdinalIgnoreCase);

答案 2 :(得分:0)

我倾向于更改以下代码:

        //Read from file, split into array of words
        Stream fs = file.OpenFile();
        StreamReader reader;
        reader = new StreamReader(fs);
        string line = reader.ReadToEnd();
        string[] words = line.Split(' ', '\n');

        //Add each word and frequency to dictionary
        foreach (string s in words)
        {
            AddToDictionary(s);
        }

到此:

wordDictionary =
    File
        .ReadAllLines(file)
        .SelectMany(x => x.Split(new [] { ' ', }, StringSplitOptions.RemoveEmptyEntries))
        .Select(x => x.ToLower())
        .GroupBy(x => x)
        .ToDictionary(x => x.Key, x => x.Count());

这完全避免了行结尾的问题,并且还具有额外的优势,即它不会留下任何不存在的流。