C#,WinForm中文本中最常用的单词

时间:2014-11-08 02:20:01

标签: c# winforms

我需要使用C#在文本中显示最常用的单词。我正在使用WinForm,VS2012。

以下代码有效但显示“我喜欢苹果”。

我可以一字一句地打破,以使其显示“苹果”,但那效率不高......

我是编程的新手,所以编码简单(必须在C#中)会很棒:)

提前谢谢大家〜

string[] source = { "I like apples.", "I like red apples.", 
                             "I like red apples than green apples." };

            var frequencies = new Dictionary<string, int>();
            string highestWord = null;
            int highestFreq = 0;

            foreach (string word in source)
            {
                int freq;
                frequencies.TryGetValue(word, out freq);
                freq += 1;

                if (freq > highestFreq)
                {
                    highestFreq = freq;
                    highestWord = word;
                }
                frequencies[word] = freq;
            }

            this.lblFreqWords.Text = highestWord; 

4 个答案:

答案 0 :(得分:2)

这是因为这一行实际上是遍历每个句子而不是每个单词:

foreach (string word in source)  // source is a collection of sentences

如果不重写整个程序,从当前集合中获取单个单词的最快方法可能是:

  • 将所有句子拼合成一个长句(使用string.join),然后
  • 通过&#34; space&#34;拆分获得单个词语:(并通过&#34;。&#34;将其排除在外)

试试这个:

var words = string.Join(" ", source).Split(new[] {' ', '.'});

foreach (var word in words)
{
    ...
}

答案 1 :(得分:1)

试试这个

 string[] source = { "I like apples.", "I like red apples.", 
                             "I like red apples than green apples." };

            var frequencies = new Dictionary<string, int>();
            string highestWord = null;
            int highestFreq = 0;

            var message = string.Join(" ", source);
            var splichar = new char[] { ' ', '.' };
            var single = message.Split(splichar);
            foreach (var item in single)
            {
                int freq;
                frequencies.TryGetValue(item, out freq);
                freq += 1;

                if (freq > highestFreq)
                {
                    highestFreq = freq;
                    highestWord = item.Trim();
                }
                frequencies[item] = freq;
            }




            this.lblFreqWords.Text = highestWord;

答案 2 :(得分:1)

我可能会使用LINQ。以下行将返回一个有序的IEnumerable<KeyValuePair<string, int>>,它(理论上)代表每个单词及其出现次数。您需要为“特殊字符”添加更多案例,例如标点符号。但这是一个好的开始。

char[] wordBreaks = new[] { ' ', '.', ',', '\'' };

return source.SelectMany(c => c.Split(wordBreaks))
             .GroupBy(c => c)
             .Select(c => new KeyValuePair<string, int>(c.Key, c.Count()))
             .OrderByDescending(c => c.Value);

当然,一旦你有了这个,你可以抓住thatValue.First().Key找到最常用的词。

答案 3 :(得分:1)

Grant Winney's answer进入为什么你的程序不起作用,但有一种更好的方法可以拆分单词然后只拆分空格和句点。正则表达式具有代表“单词边界”的符号\b,它还具有\w,其可以表示任何字母a-z,0-9和下划线。因此,如果您使用模式\b\w+\b,这意味着“单词边界后跟一个或多个字母数字字符后跟一个单词边界”。

    string[] source = { "I like apples.", "I like red apples.", 
                             "I like red apples than green apples.", 
                             "red red red apples, Yum!" };

    var frequencies = new Dictionary<string, int>();
    int highestFreq = 0;

    var combinedString = string.Join(" ", source);
    var matches = Regex.Matches(combinedString, @"\b\w+\b");
    foreach (Match match in matches)
    {
        var word = match.Value;

        int freq;
        frequencies.TryGetValue(word, out freq);
        freq += 1;

        if (freq > highestFreq)
        {
            highestFreq = freq;
        }
        frequencies[word] = freq;
    }
    //This will hold a list of all the words that match 
    var highestWords = frequencies.Where(x=>x.Value == highestFreq).Select(x=>x.Key).ToList();

    Console.WriteLine("Highest freq: {0}", highestFreq);
    foreach(var word in highestWords)
    {
        Console.WriteLine(word);
    }

Run Code

这将删除你句子中的.。如果您希望夸大的单词显示为一个单词而不是两个单词,则需要将模式更改为\b[\w-]+\b